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Replication  has  been  shown  to  be  an  important  tool  in  the  design  of  high- 
performance  and  highly- available  distributed  systems.  When  applied  to  data, 
however,  replication  significantly  complicates  the  problem  of  maintaining  consis¬ 
tency  within  a  system.  This  problem  is  further  complicated  when  repositories 
of  the  data  can  potentially  fail  and  recover.  In  this  dissertation,  we  describe 
a  log-based  mechanism  for  restoring  consistent  states  to  replicated  data  objects 
after  failures. 

A  variety  of  techniques  have  been  proposed  for  implementing  consistency  in 
a  system.  Most  of  these  techniques  focus  on  preserving  a  form  of  consistency 
based  on  serialization  of  updates.  Although  serializable  consistency  is  useful  for 
building  a  large  number  of  applications,  there  are  also  many  applications  that  do 
not  require  the  full  strength  of  consistency  that  serializability  provides.  For  these 
applications,  the  cost  of  implementing  serializable  consistency  can  be  prohibitive. 
A  number  of  weaker  and  less  expensive  consistency  forms  have  therefore  been 
proposed  for  building  such  applications. 


In  this  dissertation  we  focus  on  preserving  a  causal  form  of  consistency  based 
on  the  notion  of  virtual  time.  Causal  consistency  has  been  shown  to  apply  to 
a  variety  of  applications,  including  distributed  simulation,  task  decomposition, 
and  mail  delivery  systems.  Several  mechanisms  have  been  proposed  for  imple¬ 
menting  causally  consistent  recovery,  most  notably  those  of  Strom  and  Yemini. 
and  Johnson  and  Zwaenepoel.  Our  mechanism  differs  from  these  in  two  major 
respects.  First,  we  implement  a  roll-forward  style  of  recovery.  A  functioning  pro¬ 
cess  is  never  required  to  roll-back  its  state  in  order  to  achieve  consistency  with 
a  recovering  process.  Second,  our  mechanism  does  not  require  any  explicit  infor¬ 
mation  about  the  causal  dependencies  between  updates.  Instead,  all  necessary 
dependency  information  is  inferred  from  the  orders  in  which  updates  are  logged 
by  the  object  servers. 

Our  basic  recovery  technique  appears  to  be  applicable  to  forms  of  consistency 
other  than  causal  consistency.  In  particular,  we  show  how  our  recovery  technique 
can  be  modified  to  support  an  atomic  form  of  consistency  that  we  call  grouping 
consistency.  By  combining  grouping  consistency  with  causal  consistency,  it  may 
even  be  possible  to  implement  serializable  consistency  within  our  mechanism. 
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Chapter  1 


Introduction 


Replication  is  an  important  concept  in  the  design  of  fault-tolerant  distributed 
computing  systems.  When  applied  to  object-oriented  systems,  replication  can 
increase  the  availability  as  well  as  the  performance  of  data  objects.  However, 
replication  also  introduces  the  problem  of  maintaining  consistency  between  object 
replicas.  This  problem  is  further  compounded  when  object  replicas  can  fail  and 
recover.  In  this  dissertation  we  present  a  recovery  mechanism  for  restoring  object 
rephcas  to  consistent  states  after  failures. 

1.1  Objects  and  Recovery 

In  the  last  several  years,  object-oriented  systems  have  become  increasingly  pop¬ 
ular  [HMSC88,JLHB87,LCJS87].  These  systems  provide  their  users  with  tools 
for  building  and  maintaining  abstract  data  objects.  An  object  in  such  a  system 
generally  consists  of  an  implementation  body  along  with  an  interface.  Only  the 
interface  is  visible  to  a  client  of  the  object;  implementation  details  such  as  data 
structures  and  internal  procedures  are  hidden  from  the  client  inside  the  object 
body.  Figure  1.1  depicts  an  object-oriented  system  containing  two  objects,  a 
name  manager  and  a  resource  allocation  manager,  and  three  clients.  Clients 
begin  by  registering  themselves  with  the  name  manager  and  then  proceed  to 


1 


2 


Resource 

Names  Allocation 


Figure  1.1:  An  object-oriented  system 


allocate  resources  under  that  name  using  the  resource  allocation  manager. 

Objects  in  a  system  do  not  necessarily  exist  independent  of  one  another.  The 
states  of  different  objects  may  be  related.  In  the  above  example,  the  state  of 
the  resource  allocation  manager  is  dependent  on  the  state  of  the  name  manager; 
resources  are  only  allocated  to  registered  clients.  When  failures  occur,  however, 
consistency  constraints  between  objects  can  be  violated.  If  the  name  manager 
fails  and  subsequently  recovers,  losing  some  client  registrations  in  the  process, 
the  system  could  reflect  resources  allocations  to  unregistered  clients. 

It  is  the  purpose  of  this  dissertation  to  present  an  automatic  mechanism  for 
restoring  consistent  states  to  (replicated)  objects  after  failures.  The  mechanism 
is  based  on  logging  the  sequences  of  updates  that  occur  to  object  replicas  and 
then  using  those  sequences  to  construct  consistent  states  after  failures. 
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1.2  Consistency 

The  meaning  of  consistency  in  a  system  depends  upon  the  application  being  im¬ 
plemented.  Serializability  is  perhaps  the  most  widely  applied  form  of  consistency 
[BG81,Gra78,U1182j.  Under  serializability,  operations  on  objects  are  grouped  into 
transactions.  Each  transaction  is  executed  as  if  it  were  an  atomic  unit.  If  a  fail¬ 
ure  occurs  during  a  transaction,  the  result  of  the  transaction  is  as  if  either  all 
of  the  operations  in  the  transaction  occurred  or  none  of  the  operations  occurred. 
Further,  concurrent  transactions  are  executed  as  if  they  occurred  in  some  serial 
order  (in  reality,  the  operations  in  different  transactions  might  be  interleaved). 

Serializability  provides  a  strong  consistency  condition  that  is  sufficient  to 
guarantee  correctness  in  large  number  of  applications.  However,  for  many  ap¬ 
plications  the  cost  of  implementing  serializability  is  prohibitive.  In  addition, 
serializability  often  provides  a  stronger  consistency  constraint  than  is  required 
by  the  application.  For  these  reasons,  weaker  forms  of  consistency  that  are  less 
expensive  to  implement  have  been  examined. 

In  this  dissertation  we  focus  on  a  causal  form  of  consistency  based  on  Lam¬ 
port’s  “ happens  before relation  [Lam78].  Under  causal  consistency,  operations  on 
objects  are  partially  ordered  according  to  the  virtual  time  at  which  they  occurred 
[Jef85]  or  the  potential  flow  of  information  between  them  [BJ87a],  Objects  may 
then  only  be  accessed  in  a  manner  consistent  with  this  partial  ordering. 

Compared  with  serializability,  causal  consistency  has  the  advantage  that  it  is 
inexpensive  to  implement  (causally  consistent  message  ordering  can  be  achieved 
using  only  a  one-phase  protocol  [BJ87b,Sch88,PBS89]).  Further,  causal  consis¬ 
tency  has  been  shown  to  be  applicable  to  a  large  variety  of  applications,  including 
mail  handling  systems  [CP86],  distributed  simulation  [J+87],  and  task  decompo¬ 
sition  [BJ87a], 
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1.3  Objectives 

Recovery  mechanisms  have  been  proposed  elsewhere  for  achieving  causal  consis¬ 
tency  in  a  system  [JZ88,SY85|.  These  mechanisms  all  require  access  to  explicit 
information  about  the  causal  dependencies  between  requests.  It  is  the  goal  of 
this  work  to  show  that  consistency  can  be  achieved  without  any  such  explicit  in¬ 
formation.  Instead,  consistency  is  achieved  using  only  information  inferred  from 
the  normal  behavior  of  the  system. 

In  addition,  our  mechanism  implements  a  rollforward  style  of  recovery.  Many 
existing  solutions  use  rollback  as  a  synchronization  technique.  However,  it  is 
not  always  possible  to  rollback  the  state  of  a  process  or  object.  For  example,  the 
state  of  an  airline  reservation  system  reflects  tickets  sold  to  customers  and  money 
collected  from  those  customers.  If  a  failure  occurs,  rollback  can  be  used  to  achieve 
consistency  within  the  internal  system  state,  but  is  likely  to  leave  the  state  of  the 
system  inconsistent  with  the  external  world.  In  the  airline  reservation  example, 
it  would  be  difficult  to  rollback  or  undo  ticket  sales  to  actual  customers.  For 
this  reason,  our  solution  does  not  require  a  functioning  object  server  to  rollback 
its  state  in  order  to  achieve  consistency  with  a  newly  recovering  server.  This  is 
accomplished  at  the  cost  of  potentially  blocking  a  server  during  its  recovery. 

1.4  Outline 

We  begin  in  chapter  2  by  presenting  our  formal  system  model,  including  a  de¬ 
scription  of  log-based  recovery  and  its  relationship  to  causal  consistency. 

Chapter  3  then  describes  several  consistency  problems  that  can  arise  due  to 
failures  and  outlines  our  basic  recovery  algorithms  for  solving  these  problems. 

In  chapter  4  we  present  transformations  for  consistently  adding  and  deleting 
entries  from  server  logs.  These  transformations  are  used  in  chapter  5  to  construct 
solutions  for  the  recovery  problems  introduced  in  chapter  3. 
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When  explicit  dependency  information  is  not  available  in  a  system,  our  re¬ 
covery  algorithms  can  instead  use  dependency  estimates  in  order  to  achieve  con¬ 
sistency.  These  estimates  must  have  the  property  that  they  never  under-estimate 
the  true  set  of  dependencies.  Chapter  6  presents  several  dependency  estimates 
with  this  property.  The  estimates  are  divided  into  two  classes:  basic  and  com¬ 
pound.  The  compound  estimates  are  more  accurate  than  the  basic  estimates,  but 
are  also  more  expensive  to  compute. 

In  chapter  7  we  discuss  several  issues  concerning  the  efficiency  of  the  recovery 
algorithms.  We  begin  by  discussing  a  cyclic  condition  that  can  lead  to  block¬ 
ing  during  recovery.  We  show  how  this  condition  can  be  avoided  by  properly 
structuring  a  system.  We  then  describe  a  special  class  of  systems  that  can  be 
efficiently  recovered  using  the  basic  estimates,  without  the  possibility  of  block¬ 
ing.  We  conclude  the  chapter  by  outlining  the  problems  involved  in  implementing 
object  checkpoints. 

Our  basic  recovery  technique  can  be  applied  to  forms  of  consistency  other  than 
causal  consistency.  In  chapter  8  we  describe  how  the  recovery  mechanism  can  be 
modified  to  provide  an  atomic  form  of  consistency  called  grouping  consistency. 

Chapter  9  concludes  the  dissertation  by  summarizing  the  results  and  dis¬ 
cussing  several  related  areas  for  future  research. 


Chapter  2 


Formal  System  Model 


In  this  chapter  we  present  a  partially  replicated,  variant  of  the  client-server  model 
of  computation  [BJ87a,BN84,Coo85j.  The  model  is  designed  to  represent  a  highly 
asynchronous  system  and  focuses  on  those  aspects  of  the  system  that  are  relevant 
to  the  recovery  of  data  after  a  failure.  The  model  uses  asynchronously  generated 
logs  to  record  changes  to  data  and  to  recover  the  data  after  failures.  In  addi¬ 
tion,  we  describe  notions  of  correctness  and  consistency  based  on  causality  (or 
which  events  precede  others  [Lam78])  and  discuss  their  relationship  to  log-based 
recovery. 


2.1  Clients  and  Servers 

The  active  entities  in  a  system  are  servers  and  clients.  Servers  replicate  and 
maintain  data  objects  that  are  read  and  updated  by  the  clients.  We  let  SS7ZV 
denote  the  set  of  servers  in  the  system  and  let  OB  JS  denote  the  set  of  data 
objects  managed  by  the  servers.  Each  object,  A  €  OBJS ,  is  replicated  at  some 
subset  of  the  servers,  S£7iV which  we  refer  to  as  the  server  set  of  the  object 
(SSTZVa  C  SETIV).  For  convenience,  we  will  denote  the  set  of  objects  managed 
by  a  server,  /,  as  OBJS  f. 

OBJSf  =  {  A  €  OBJS  |  /  €  5571V *  } 


6 


Figure  2.1:  Overlap  between  object  server  sets 


Figure  2.1  illustrates  the  overlap  between  the  server  sets  of  different  objects 
in  an  example  system.  Depicted  are  the  server  sets  of  four  objects:  A,  B,  C,  and 
D.  Note  that  the  server  set  of  object  D  is  completely  contained  within  the  server 
set  of  object  A. 

A  client  accesses  (reads  or  updates)  an  object  by  broadcasting  its  request  to 
all  servers  managing  a  replica  of  the  object.  Upon  receiving  a  request,  each  server 
makes  the  appropriate  update  to  its  object  replica.  We  assume  that  the  state  of 
a  replica  is  completely  determined  by  the  sequence  of  updates  received  by  the 
replica’s  server  and  that  other  factors,  such  as  the  time  of  an  update’s  receipt  or 
the  timing  between  updates,  do  not  affect  a  replica’s  state.  It  is  not  necessary, 
however,  that  all  servers  receive  requests  in  the  same  order.  Concurrently  issued 
requests  can  be  received  by  different  servers  in  different  orders,  provided  that 
those  orders  lead  to  equivalent  object  states.  This  issue  is  discussed  in  further 
detail  in  section  2.2. 

As  an  example,  consider  a  system  service  for  managing  lists.  This  service 
might  provide  users  with  functions  for  creating  new  lists,  adding  and  deleting 
entries  from  existing  lists,  and  querying  the  contents  of  lists.  One  use  for  such 
a  service  would  be  to  manage  resource  allocations  to  client  processes.  Clients 
would  begin  by  submitting  their  names  to  a  list  of  registered  processes.  Once 
registered,  clients  could  allocate  resources  by  making  entries  into  a  resource  al- 


8 


location  list.  Such  a  system  is  depicted  in  figures  2.2  and  2.3.  In  both  figures, 
the  list  of  registered  process  names  is  replicated  at  servers  /  and  g.  while  the 
list  of  allocated  resources  is  replicated  at  servers  g  and  h.  Figure  2.2  depicts  the 
concurrent  submission  of  two  client  name  registration  messages  ( reg\  and  reg-i)- 
Figure  2.3  depicts  the  concurrent  submission  of  two  resource  allocation  messages 
( alc\  and  0/02).  Note  that  in  both  examples  the  concurrent  submissions  are 
received  in  different  orders  by  the  servers. 

It  may  seem  unusual  that  a  server  may  manage  replicas  of  multiple  objects. 
However,  in  object-oriented  systems  that  replicate  data,  we  believe  that  such 
overlap  between  the  server  sets  of  objects  is  common.  The  work  in  dissertation 
was  motivated  by  the  need  to  implement  failure  recovery  in  the  ISIS  system 
[BCJ+].  In  the  ISIS  system,  servers  often  implement  general  objects,  such  as  list 
management  in  the  previous  example.  These  objects  axe  then  used  by  clients 
to  implement  more  specific  services,  such  as  name  management  and  resource 
allocation.  Because  of  availability  and  performance  considerations,  not  all  of  the 
general  servers  may  manage  each  of  the  specific  services.  Further,  the  subset  of 
servers  that  do  manage  a  specific  service  may  dynamically  change  as  servers  fail 
and  recover,  or  as  different  availability  and  performance  constraints  are  placed 
on  the  service.  As  a  result,  general  object  servers  often  manage  multiple  specific 
services. 


2.2  Request  Ordering  and  Causality 

Clients  in  a  system  interact  with  each  other  in  many  ways.  Clients  communicate 
directly  by  sending  messages  to  one  another,  and  indirectly  through  the  objects 
managed  by  the  servers.  These  interactions  may  lead  to  causal  dependencies  be¬ 
tween  the  object  requests  they  invoke.  For  example,  in  the  system  of  figure  2.3. 
two  clients  may  agree  to  transfer  an  allocated  resource  between  them.  When  this 
occurs,  the  allocation  service  is  notified  of  the  transfer  through  a  re-allocation 
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Names  Allocations 


Figure  2.2:  Concurrent  submission  of  two  name  registration  mes¬ 
sages 


Names 


Allocations 


Figure  2.3:  Concurrent  submission  of  two  resource  allocation  mes¬ 
sages 
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Request  Structure:  (7Z,^tz) 

H  =  {regi,  regi,  alcx ,  a/c2} 

Ttg i  -<£  a/cj  re^2  -<£  a/c2 

Figure  2.4:  Resource  allocation  request  structure 


request  message  sent  by  the  clients.  This  re-allocation  request  is  causally  depen¬ 
dent  on  the  original  allocation  request  (as  well  as  on  the  registration  requests 
of  the  clients  involved);  no  server  should  receive  the  transfer  request  until  it  has 
received  the  clients’  registration  messages  and  the  resource’s  initial  allocation 
message. 

We  summarize  the  set  of  causal  dependencies  between  the  client  requests 
in  a  system  by  means  of  a  request  structure.  A  request  structure  is  a  logical 
entity  designed  to  represent  the  behavior  of  clients  as  seen  by  an  outside  observer 
looking  back  on  the  system  after  its  completion.  As  such,  the  request  structure 
of  a  system  is  static. 

Definition  2.1 

A  request  structure  is  a  partially  ordered  set  of  requests  {71,  -<■%). 

Here,  71  is  the  set  of  all  requests  made  by  clients  in  the  system  and  -<ti  relates  all 
pairs  of  causally  dependent  requests.  If  two  requests  are  related,  x  <ti  y,  then 
request  y  is  causally  dependent  on  request  x.  The  relation  ~<i z  is  equivalent  to 
the  “ happens  before ”  relation  of  Lamport  [Lam78]  and  like  the  “happens  before 
relation  -<n  is  transitive  and  acyclic.  TZ  may  contain  requests  made  on  many 
different  objects.  For  any  request,  i  S  U,  we  will  sometimes  use  the  notation  x.A 
to  indicate  that  request  x  was  made  on  object  A.  A  request  structure  representing 
the  dependencies  in  the  resource  allocation  system  is  shown  in  figure  2.4. 

Recall  that  servers  process  requests  in  the  order  in  which  they  receive  them. 
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We  assume  that  in  order  to  construct  correct  replica  states,  servers  must  receive 
(process)  requests  in  causally  consistent  orders  (i.e.  in  orders  consistent  with 
the  application’s  request  structure  (7£,  ^)).  If  a  server  receives  two  related 
(ordered)  requests,  x  -<n  y ,  then  it  must  receive  request  x  before  it  receives 
request  y.  Unrelated  requests  may  be  received  by  a  server  in  any  order  and 
different  servers  may  even  receive  the  same  unrelated  requests  in  different  orders. 

We  do  not  assume  that  servers  are  given  any  explicit  information  about  the 
dependencies  between  the  requests  they  receive.  In  particular,  we  do  not  assume 
that  servers  have  any  explicit  knowledge  of  (7 It  is  the  responsibility  of 
the  clients  to  ensure  that  all  servers  perceive  causally  consistent  request  order¬ 
ings.  A  variety  of  techniques  exist  for  clients  to  order  their  requests  [BJ87b, 
CM84,CASD86,PBS89].  We  will  not,  however,  make  any  assumption  about  the 
mechanism  used.  Clients  may  use  any  technique  that  guarantees  correct  request 
orderings. 

2.3  Failures  and  Recovery 

We  assume  fail-stop  servers  [SS83].  When  a  server  fails,  it  immediately  ceases 
to  receive  and  process  client  requests,  and  the  other  servers  in  the  system  are 
notified  of  its  failure.  In  addition,  the  failed  process  also  loses  the  contents  of  its 
volatile  memory.  We  assume  that  other  types  of  failures,  such  as  send/receive 
omission  failures  [PT86]  or  Byzantine  (malicious)  failures  [LSP82],  do  not  occur. 
We  also  assume  that  network  partitions  [DGMS85]  never  occur,  so  that  non-failed 
servers  can  always  communicate  between  themselves. 

In  order  to  support  recovery  from  failures,  each  server  maintains  a  log  of  the 
object  updates  it  performs. 

Definition  2.2 

A  log  is  a  totally  ordered  set  of  requests  (£,  —*c)- 
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Here,  C  is  the  set  of  object  update  requests  received  by  the  server  and  — *£  is 
their  order  within  the  log.  For  the  present,  logs  will  be  restricted  to  contain 
only  requests;  they  will  not  contain  checkpoints.  In  any  real  system  checkpoints 
are  necessary  to  limit  the  growth  of  logs.  However,  the  presence  of  checkpoints 
complicates  the  problem  of  recovery  and  so  their  use  will  be  postponed  until 
chapter  7. 

Servers  log  requests  in  the  order  in  which  they  receive  them.  Because  servers 
receive  requests  in  causally  consistent  orders,  it  follows  that  servers  log  requests 
in  orders  consistent  with  the  application's  request  structure. 

Definition  2.3 

The  log,  of  a  server  f  is  consistent  with  a  request  structure, 

-<*),  */ 

1.  V  x.A€Cf:  f  e  S£1ZVa 

2.  V  x.A£  Cf  :  Vy.fl  €  ft : 

{y.B  -<■&  x.a  A  /  €  SEUV b)  =>  (y-B  e  £/  A  y-B  — ►/  x.a) 

In  the  treatment  that  follows,  we  assume  that  a  request  is  logged  by  a  server 
as  soon  as  it  is  received  and  processed,  and  so  the  log  of  a  server  always  re¬ 
flects  the  current  states  of  the  server’s  object  replicas.  For  efficiency,  a  server 
could  decouple  its  execution  speed  from  that  of  its  log  by  buffering  requests  in 
memory  and  periodically  flushing  the  buffer  to  its  log.  A  server’s  log  would  then 
reflect  states  that  lag  behind  the  actual  states  of  its  replicas.  Managing  a  server  s 
log  asynchronously  from  its  replicas  does  not  affect  the  validity  of  our  results. 
However,  it  would  complicate  the  discussion.  If  it  were  really  desired  to  imple¬ 
ment  this  restriction,  a  server  could  use  a  technique  such  as  wnte-ahead  logging 

[BHG87J. 

Servers  in  our  model  do  not  coordinate  their  logs  with  those  of  other  servers. 
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time  0  time  t \  time  1 2 


Figure  2.5:  An  execution  of  the  resource  allocation  system 


Each  server  logs  the  requests  it  receives  independent  of  the  times  when  those 
requests  are  logged  by  other  servers.  As  a  result,  the  state  of  an  object  represented 
in  one  log  may  fall  behind  the  state  of  that  object  represented  in  some  other 
log.  Further,  because  servers  do  not  always  receive  requests  in  the  same  order, 
different  servers  may  have  logged  different  requests  for  the  same  object  at  any 
one  time. 

Figure  2.5  illustrates  one  possible  execution  of  the  system  of  figures  2.2 
and  2.3.  In  the  figure,  horizontal  lines  represent  client  and  server  executions 
through  time  while  diagonal  arrows  represent  request  message  broadcasts.  De¬ 
picted  are  the  broadcasts  of  two  name  registration  messages  (re#i  and  re^)  and 
one  resource  allocation  message  (0/02).  Note  that  server  /  fails  at  time  ti  before 
receiving  and  logging  the  second  registration  message,  and  that  server  g  fails  at 


14 


time  ti  after  receiving  and  logging  all  three  broadcasts.  The  contents  of  each 
server's  log  are  shown  below  that  server’s  time  line  after  each  request  receipt. 

Managing  server  logs  asynchronously  from  one  another  reduces  the  system 
overhead  by  decoupling  the  execution  speeds  of  different  servers.  Each  server  is 
free  to  process  requests  at  a  rate  independent  of  the  other  servers.  Unfortunately, 
as  we  will  see  in  the  next  chapter,  the  use  of  asynchronous  logs  leads  to  coordi¬ 
nation  problems  between  servers  after  failures.  These  problems  can  be  avoided 
by  coordinating  the  logs  of  different  servers  ( pessimistic  logging  techniques  exist 
for  doing  this  [JZ87,PP83]).  However,  this  adds  substantial  overhead  to  the  nor¬ 
mal  operation  of  a  system.  We  therefore  choose  to  manage  logs  asynchronously, 
postponing  the  overhead  of  coordinating  logs  until  the  time  of  a  server's  fail¬ 
ure  recovery.  If  failures  are  rare,  this  optimistic  approach  should  lead  to  good 
performance  of  the  system. 

Other  optimistic  logging  techniques  have  been  proposed  for  managing  fail¬ 
ures  in  distributed  systems  [SY85,JZ88].  These  techniques  involve  maintaining 
explicit  information  about  the  causal  dependencies  between  updates.  Managing 
such  information  can  be  difficult  or  impossible,  though,  when  the  set  of  clients 
is  either  unknown  to  the  servers  or  large  and  dynamically  changing.  We  there¬ 
fore  examine  the  problem  of  optimistic  failure  recovery  in  systems  where  explicit 
dependency  information  is  not  available. 

A  server  uses  its  log  to  recover  from  failures  in  the  usual  way.  In  order  to 
restore  the  state  of  a  failed  object  replica,  a  recovering  server  simply  re-executes 
the  sequence  of  updates  logged  for  the  object.  Once  the  recovering  server  has 
restored  its  (volatile)  replica  of  an  object,  that  server  begins  receiving,  processing, 
and  logging  new  requests  on  the  object.  We  refer  to  a  server  that  is  in  the  process 
of  restoring  its  replica  of  an  object  as  a  recovering  server  of  that  object  and  we 
refer  to  a  server  that  can  process  new  requests  on  an  object  as  an  active  server 
of  the  object. 


Note  that  a  recovering  server  does  not  have  to  re-execute  the  updates  for  an 
object  in  the  order  in  which  they  were  logged.  Previously  we  stated  that  a  server's 
object  replicas  are  correct  if  that  server  processes  requests  in  causally  consistent 
orders.  Because  of  this,  a  recovering  server  can  re-execute  logged  updates  in  ant- 
order  consistent  with  the  application’s  request  structure,  and  still  reconstruct 
valid  object  replicas.  Of  course,  the  order  in  which  a  server  logs  requests  is 
always  consistent  with  (V,,  -<n),  and  so  this  order  can  be  used  to  construct  valid 
replica  states.  This  is  particularly  useful  when  servers  does  not  have  access  to 
any  explicit  dependency  information,  and  so  cannot  determine  other  valid  request 
orderings. 

We  represent  the  state  of  an  object  reflected  in  a  server’s  log  by  the  set  of 
updates  it  contains  for  that  object. 

of  a  log,  onto  an  object,  A  6  OBJS,  13 

\a  =  (iJ  |  x.A  E  Cf) 

2.4  System  State  and  Consistency 

The  state  of  a  system  can  be  summarized  in  terms  of  the  contents  of  the  servers 
logs  and  the  status  of  each  server  (the  log  of  an  active  server  reflects  the  actual 
states  of  the  server’s  replicas). 
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•ACT  s/n  ames  0  H£C  S/Names  =  {/}  TAlCg/  .Vames  —  {5} 

ACT  $  1  ^locations  —  {^}  H£C  5/  Allocations  =  ®  ^ AT  C  5/  Allocations  =  {^} 


regi 

\  reg\ 

(CS/g^S/gy- 

regi 

ale? 

ale? 

Figure  2.6:  A  possible  state  of  the  resource  allocation  system 


Definition  2.5 

A  state.  S,  of  the  system  is  characterized,  by  the  following  values: 

For  each  data  object,  A  €  OBJS : 

ACT s/a  The  set  of  active  servers  of  object  A. 

H£C§/a  The  set  of  recovering  servers  of  object  A. 

TAICS/a  The  set  of  failed  servers  of  object  A. 

For  each  server,  /  €  S£71V: 

(£S/fi—*S/f)  The  log  of  server  f . 

For  example,  consider  again  the  execution  of  figure  2.5.  Suppose  that  server  / 
begins  to  recover  at  time  <2,  when  server  g  fails.  In  this  case,  figure  2.6  shows 
the  state,  S,  of  the  system  immediately  after  time  ti. 

When  a  server  fails,  it  fails  for  all  objects  it  manages.  When  the  server  later 
recovers,  it  begins  recovering  the  states  of  all  replicas  it  manages. 

(3  A  6  OBJS:  f  €  TAICs/a)  =* 

(VAeOBJSf.  f  €  TAICs/a) 
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We  denote  the  complete  set  of  failed  servers  in  state  5  as  T AICs- 

TAICs  =  U  FAICs/a 

AeOBJS 

In  this  dissertation,  we  will  be  concerned  with  the  problem  of  maintaining  t he 
overall  consistency  of  a  system’s  state  (as  well  as  the  consistency  of  server  logs , 
when  servers  fail  and  recover.  There  are  two  aspects  to  the  issue  of  a  system's 
overall  consistency.  First,  there  is  the  issue  of  consistency  between  the  replicas  of 
the  same  object.  Second,  there  is  the  issue  of  consistency  between  the  states  of 
different  objects.  We  briefly  discuss  each  of  these  aspects  in  turn.  A  more  formal 
treatment  of  these  issues  is  reserved  for  chapter  3. 

All  active  servers  of  an  object  should  maintain  equivalent  states  for  their  ob¬ 
ject  replicas,  so  that  the  servers  behave  consistently  with  respect  to  one  another. 
Because  servers  execute  asynchronously  from  one  another,  different  servers  may 
construct  this  state  at  different  speeds  and  by  processing  requests  in  different 
orders.  We  assume  that  at  the  time  a  server  recovers,  all  active  servers  of  an 
object  have  constructed  (and  logged)  equivalent  object  states.  This  state,  which 
we  refer  to  as  the  active  state  of  the  object,  is  the  state  the  recovering  server 
should  restore  to  its  replica. 

Definition  2.6 

The  active  state  of  an  object,  A  €  OBJS,  in  system  state  S  is 

■ASs/a  —  (cs/f'-¥s/f)  U  V  /  6  ACTS/a 

Restricting  active  servers  to  equivalent  object  states  (at  the  time  of  a  server 
recovery)  is  reasonable.  For  example,  in  the  ISIS  system  [BJ87b]  process  failure 
and  recovery  events  are  totally  ordered  with  respect  to  all  other  events  (message 
broadcasts)  in  the  system.  Thus,  when  a  server  recovers  from  a  failure,  it  can 
assume  that  all  active  servers  of  an  object  have  received  the  same  set  of  requests 
and  thereby  constructed  the  same  object  state.  Note  that  the  restriction  on 
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identical  states  is  only  required  to  hold  at  the  time  of  a  server  recovery.  At  till 
other  times  during  the  execution  of  the  system,  servers  are  free  to  maintain  then 
object  replicas  asynchronously. 

The  second  aspect  to  the  issue  of  a  system’s  overall  consistency  is  consistency 
between  the  states  of  different  objects.  The  state  of  am  object  should  never  reflect 
a  request  (update)  unless  all  of  the  requests  on  which  it  is  causally  dependent 
are  also  reflected  in  their  object’s  active  states.  For  example,  a  system  running 
under  the  request  structure  of  figure  2.4  should  never  be  in  a  state  that  reflects  the 
allocation  ( alc\ )  made  by  the  first  client  without  reflecting  the  client’s  registration 
(reg\). 

A  system  state,  5,  is  said  to  be  observably  consistent  with  a  request  structure 
(11, -<k),  if  the  above  consistency  constraints  hold  within  the  active  portion  of 
the  system.  That  is,  a  state  is  consistent  with  a  request  structure  if  all  active 
servers  of  an  object  have  logged  the  same  (valid)  state  for  the  object  and  the 
states  of  all  different  active  objects  are  mutually  consistent.  These  constraints 
are  only  required  to  hold  within  the  active  part  of  a  system  because  this  is  the 
only  portion  of  the  system  visible  to  clients. 

Definition  2.7 

A  system  state,  S,  is  observably  consistent  with  a  request  structure, 

(*,-<*),  if 

1.  V  /  €  SSHV  -  TAlCs  :  (Cs/f'  -»s//)  w  consistent  with  (1 Z,<n)- 

2. 'iAeOBJS\  V  f,g  e  ACT s/a:  (^s/f'~*S/f^A  =  ~*S/g)  ^ 

S.V  A,  Be  OBJS  (ACT  si  A  *  0  A  ACTS/B  *  0)  : 

V  x.A  e  ASs/a  ■  v  y.B  6  n  (y.B  x.A)  :  y.B  €  ASs/b 

This  dissertation  presents  a  recovery  mechanism  for  maintaining  observable  con¬ 
sistency  in  the  presence  of  server  failures  and  recoveries. 
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2.5  Summary 

This  chapter  presented  a  formal  model  of  replicated  data  in  an  asynchronous 
distributed  system.  The  model  was  designed  to  focus  on  those  aspects  of  the 
system  relevant  to  the  recovery  of  data  after  a  failure. 

A  system  consisted  of  a  set  of  servers,  SSTZV ,  replicating  a  set  of  data  objects. 
OBJS,  along  with  a  set  of  clients  that  accessed  and  updated  those  objects.  A 
basic  assumption  was  that  objects  were  partially  replicated  within  larger  groups 
of  servers.  This  lead  to  arbitrary  overlap  between  the  sets  of  objects  individual 
servers  managed.  A  client  in  the  system  accessed  an  object  by  broadcasting  a 
request  message  to  all  servers  of  the  object.  An  underlying  structure,  (7£,  -<n)- 
governed  the  correct  orders  in  which  servers  could  receive  requests.  Because  this 
request  structure  was  unknown  to  the  servers,  it  was  the  responsibility  of  the 
clients  to  ensure  the  servers  perceived  correct  message  orderings. 

In  order  to  support  recovery  from  fail-stop  failures,  each  server  maintained  a 
log,  of  the  client  requests  it  received.  There  was  no  synchronization 

between  the  logs  of  different  servers.  Each  server  logged  requests  as  soon  as  they 
were  received.  It  was  noted  that  the  order  in  which  requests  appear  within  logs 
is  always  consistent  with  the  application's  request  structure.  After  a  failure,  a 
server  reconstructed  the  states  of  its  object  replicas  by  replaying  the  requests  in 
its  log. 

Servers  could  recover  differing  replica  states  because  logs  were  maintained 
asynchronously.  A  system  was  said  to  be  observably  consistent  if  three  conditions 
held: 

1.  The  order  of  requests  in  all  servers’  logs  (i.e.  the  states  of  the  servers' 
replicas)  are  consistent  with  the  application’s  request  structure. 

2.  All  active  servers  of  an  object  have  logged  (constructed)  the  same  state  for 
the  object. 
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3.  The  states  of  all  active  objects  are  mutually  consistent  ( i.e .  consistent  with 
respect  to  the  application’s  request  structure. 

Developing  a  recovery  mechanism  for  maintaining  this  consistency  is  the  goal  of 
this  dissertation. 


Chapter  3 


Consistency  Problems 


The  use  of  asynchronous  logs  potentially  allows  servers  to  recover  inconsistent 
states  after  failures.  This  chapter  describes  (in  outline  form)  a  recovery  mech¬ 
anism  for  preventing  such  inconsistencies.  The  chapter  begins  by  presenting 
several  examples  of  how  inconsistencies  arise.  The  behavior  of  the  recovery  mech¬ 
anism  is  then  formally  described  and  several  examples  of  its  operation  are  given. 
This  chapter  presents  only  a  formal  outline  of  the  recovery  mechanism.  The 
implementation  of  the  mechanism  is  the  subject  of  the  remainder  of  this  disser¬ 
tation. 


3.1  Problem  Examples 

Two  types  of  inconsistencies  can  develop  in  a  system:  those  between  the  states 
of  an  object’s  different  replicas  and  those  between  the  states  of  different  objects. 
We  present  three  examples  of  such  inconsistencies.  The  first  two  illustrate  incon¬ 
sistencies  that  can  develop  between  an  object’s  replicas.  The  last  illustrates  an 
inconsistency  that  can  develop  between  the  states  of  two  objects. 
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time  0  time  t\  time  ti 


Figure  3.1:  Inconsistency  with  an  active  replica 
3.1.1  Consistency  with  Active  Replicas 

At  the  time  a  server  recovers  from  a  failure,  its  log  reflects  the  states  of  its  object 
replicas  from  the  time  of  the  failure.  When  the  recovering  server  replays  its  log. 
it  restores  its  replicas  into  these  states.  These  states  may,  however,  be  out  of 
date  if  other  servers  of  the  objects  remained  active,  processing  updates  after  the 
recovering  server’s  failure.  Such  updates  would  be  reflected  in  the  replicas  of  the 
active  servers,  but  not  in  the  replicas  of  the  recovering  server. 

For  example,  consider  the  execution  of  the  resource  allocation  system  shown 
in  figure  3.1.  The  execution  depicts  the  transmission  of  two  client  registration 
messages  (regi  and  rtg-i)  and  one  resource  allocation  request  ( ale 2).  In  the  figure, 
server  /  receives  both  registration  requests  without  failing.  Server  g  fails  at  time 
t\  after  receiving  requests  reg?  and  alci,  but  before  receiving  request  reg And. 
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server  h  fails  after  receiving  request  ale 2.  Suppose  that  server  g  recovers  after 
time  t\.  Server  g  will  then  recover  its  replica  of  object  "Names"  into  a  state 
reflecting  only  the  registration  of  client  2.  It  will  not  recover  the  registration  of 
client  1  reflected  in  the  object’s  active  state  (the  state  reflected  in  the  replica  of 
server  /).  The  contents  of  both  servers  logs  at  the  time  of  the  recovery  are  shown 
below: 


Server  / 

regi 

Server  g 

reg2 

(active) 

regi 

(recovering) 

alc2 

This  type  of  inconsistency  can  be  prevented  by  transferring  the  active  states 
of  objects  to  the  failed  server  at  the  time  of  recovery.  The  recovering  server  would 
then  alter  its  log  to  reflect  these  transferred  states  so  that  it  restores  them  during 
log  replay.  This  is  the  approach  used  by  ISIS  [BJ87a]  and  will  be  the  approach 
used  in  our  recovery  mechanism. 

3.1.2  Consistency  between  Recovering  Replicas 

A  similar  type  of  inconsistency  can  occur  when  several  servers  of  an  inactive  ob¬ 
ject  (an  object  for  which  all  servers  have  failed)  recover  simultaneously.  Because 
the  servers  maintain  their  logs  asynchronously  from  one  another,  and  because 
they  probably  failed  at  different  times,  each  server’s  log  probably  reflects  a  dif¬ 
ferent  state  of  the  object.  Each  server  is  therefore  likely  to  recover  a  state  for  its 
object  replica  that  differs  from  (is  inconsistent  with)  the  states  recovered  by  the 
other  servers. 

For  example,  consider  the  execution  of  the  resource  allocation  system  shown 
in  figure  3.2.  This  execution  is  similar  to  the  previous  one  except  that  server 
/  fails  before  receiving  registration  request  rtgi-  Suppose  that  both  servers  / 
and  g  simultaneously  recover  at  some  point  after  time  1 2.  The  servers  will  then 
recover  inconsistent  states  for  their  replicas  of  “Names”.  Server  /  will  recover  a 


time  0 


time  t\ 


time  ti 


time  1 3 


Figure  3.2:  Inconsistency  between  recovering  replicas 
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state  reflecting  only  the  registration  of  client  1  and  server  g  will  recover  a  state 
reflecting  only  the  registration  of  client  2.  This  situation  is  depicted  below: 

Server  /  I--- “I  Server  g 

(recovering)  re9\  (recovering) 

This  inconsistency  problem  can  be  solved  by  having  the  recovering  servers 
choose  a  new  state  for  the  object  and  then  alter  their  logs  so  that  they  all  recover 
this  state  during  log  replay.  Ideally,  this  state  should  be  a  recent  one,  reflecting 
as  many  of  the  client  requests  as  possible.  In  synchronous  systems,  where  the  logs 
of  servers  are  coordinated,  the  log  of  the  last  server  to  fail  [Ske85j  will  contain 
the  most  recent  state  of  the  object.  This  state  could  then  be  used  to  recover 
the  failed  servers.  When  logs  are  not  coordinated,  however,  any  server  may  have 
logged  the  most  recent  state.  Different  servers  may  even  have  logged  different 
sets  of  requests  and  so  no  server  will  have  logged  the  most  recent  state.  In  tins 
case,  a  recent  state  of  the  object  can  be  formed  by  merging  the  logged  requests 
of  the  recovering  servers.  This  is  the  approach  used  by  our  recovery  mechanism. 

3.1.3  Consistency  between  Active  Objects 

The  previous  two  examples  illustrated  consistency  problems  that  develop  between 
different  replicas  of  a  single  object.  Because  dependencies  can  exist  between 
requests  on  different  objects,  inconsistencies  cam  also  develop  between  the  states 
of  different  objects.  Let  5  denote  a  state  of  a  system  in  which  some  failed  server 
/  is  recovering  its  replica  of  an  object,  A ,  and  in  which  some  other  object.  B .  is 
active.  If  the  state  of  object  A  logged  by  server  /  is  old,  /  may  recover  a  state 
that  does  not  reflect  all  of  the  updates  on  which  the  active  state  of  B  ( ASg/B ) 
depends.  Similarly,  if  the  active  state  of  B  is  old  ( i.e .  it  is  the  result  of  a  previous 
failure  recovery  of  its  servers),  it  may  be  missing  updates  on  which  the  state  of 
A  recovered  by  server  /  depends. 
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As  an  example,  consider  again  the  execution  shown  in  figure  3.2.  If  servers  j 
and  h  recover  at  some  point  after  time  t3,  they  will  recover  mutually  inconsistent 
states.  Server  h  will  recover  an  allocation  request  \alc2)  from  a  client  whose 
registration  {regi)  is  not  recovered  by  server  /.  That  is.  the  servers  will  recover 
a  state  that  reflects  a  client’s  allocation  without  reflecting  the  registration  on 
which  it  depends.  Shown  below  are  the  logs  of  the  two  servers  at  the  time  of  the 
recovery: 

Server  /  Server  h 

(recovering)  '  re^1  (recovering) 

Inconsistencies  between  different  objects  are  the  most  difficult  ones  to  prevent 
in  a  system,  and  are  the  focus  of  the  recovery  mechanism. 

3.2  Recovery  Mechanism 

In  order  to  preserve  consistency  within  a  system,  a  recovering  server  must  be 
careful  about  the  states  it  restores  to  its  object  replicas.  A  recovering  server  must 
restore  replicas  of  active  objects  using  those  objects’  current  states.  A  recovering 
server  must  also  restore  replicas  of  inactive  objects  to  states  consistent  with  the 
rest  of  the  system  ( e.g .  the  state  must  agree  with  those  of  other  recovering  replicas 
of  the  object,  and  the  state  must  be  consistent  with  the  states  of  other  active 
objects  in  the  system). 

Our  recovery  mechanism  enforces  these  constraints  in  two  phases.  In  the 
first  phase,  a  failed  server’s  replicas  of  active  objects  are  restored  to  the  objects' 
current  states  in  the  system.  We  refer  to  this  as  the  server’s  JOIN  phase.  Once 
the  server  has  completed  its  JOIN  phase,  its  replicas  of  inactive  objects  are 
restored  to  states  consistent  with  the  state  of  the  system.  We  refer  to  this  as 
the  server’s  ACTIVATE  phase.  Figure  3.3  illustrates  the  relationship  of  the  two 
recovery  phases.  The  behaviors  of  the  two  phases  are  formally  outlined  in  the 
following  sections. 
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JOIN  Phase:  (immediately  upon  recovery) 

1.  for  each  A  €  OBjJS j  ( ACT sjA  ^  0) 

alter  (£$//'  so  *^at 

S/f'~*S/f )  Ia  =  -AS Sf A 

2.  reconstruct  replicas  of  active  objects  from  ~*s/f ) 

3.  begin  processing  new  requests  on  active  objects 

ACTIVATE  Phase:  (upon  completion  of  JOIN  phase) 

4.  while  3 A  €  OBJS f  (ACTs/a  =  0) 

wait  for  all  g  €  'R.ECs/a  to  complete  their  JOIN  phases 
construct  a  new  state,  5^,  for  object  A  by  merging  the  logs 
of  all  members  of  Tl£C  $/  a 

if  SA  is  inconsistent  with  the  state  of  any  active  object 
then  abort  activation  of  A  until  additional  servers 
recover 

activate  object  A  by: 

altering  (£5^,-1 ’5//)  so  ^at 

reconstruct  replica  of  A  from  (^s//,— *S//) 
begin  processing  new  requests  on  A 

Figure  3.3:  Recovery  sequence  of  server  f  in  state  S 
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The  recovery  sequence  of  a  server  is  divided  into  two  phases  for  several  rea¬ 
sons.  The  JOIN  phase  provides  a  server  with  information  about  the  states  of 
some  of  the  active  objects  in  the  system.  This  information  is  used  in  the  AC¬ 
TIVATE  phase  to  ensure  that  only  consistent  states  are  recovered  for  inactive 
objects.  A  consistent  state  cannot  always  be  recovered,  however,  for  an  inactive 
object;  moreover,  the  ACTIVATE  phase  cannot  always  determine  (based  on  the 
dependency  information  available  to  it)  if  the  state  it  constructed  for  an  object 
is  consistent  with  the  states  of  all  active  objects.  When  it  cannot  determine 
the  consistency  of  a  state,  the  ACTIVATE  phase  must  temporarily  abort  the  re¬ 
covery  of  an  object  until  other  servers  recover,  providing  additional  dependency 
information.  The  JOIN  phase,  on  the  other  hand,  never  needs  to  abort  and  so  it 
is  separated  from  the  ACTIVATE  phase. 

3.2.1  JOIN  Phase  Outline 

When  a  server  begins  recovering  from  a  failure,  its  status  is  upgraded  from  a 
failed  server  to  a  recovering  server  for  each  object  it  manages.  The  JOIN  phase 
is  responsible  for  bringing  the  state  of  a  newly  recovering  server  up  to  date 
with  respect  to  the  states  of  active  objects  in  the  system.  The  current  states 
of  active  objects  are  transferred  from  the  active  servers  to  the  recovering  server 
and  the  recovering  server’s  log  is  altered  to  reflect  these  current  object  states. 
The  recovering  server’s  replicas  are  then  restored  by  replaying  the  appropriate 
portion  of  the  log  and  the  server  begins  processing  new  client  requests  on  the 
objects. 

The  changes  that  occur  to  the  system  state  as  a  result  of  the  JOIN  phase 
are  summarized  in  definition  3.1.  Note  that  the  only  portion  of  the  state  that 
changes  is  the  portion  related  to  the  recovering  server  (/). 
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Definition  3.1 

A  state,  T,  solves  the  JOIN  problem  for  server  f  6  S£71V  m  state  S  under 
request  structure  (7Z,  -<n)  if  T  satisfies  the  following  conditions: 

JCl.  (£jyy,  is  consistent  with  {1Z,  <n). 

JC2.  The  new  log  of  server  f  reflects  the  current  states  of  active  objects. 

V  A  €  OBJSf  ( ACTs/a  *  0)  :  (CT/f,-*T/j)  |A  =  ASs/a 
JC3.  The  only  log  that  changes  is  that  of  server  f. 

Vg€S£1ZV  (g*f):  (CT/g,^T/g)  =  (Cs/g^s/g) 

JC4.  Server  f  changes  from  a  recovering  to  an  active  server  of  the  active 
objects. 

V  A  €  OBJS  (  ACTs/a  *  0)  : 

f€S£KVA  => 

(Actt/a  =  acTs/aW)  a  necTfA  =  nscs/A  -  {/}) 

fts£nvA  => 

(ACTt/a  =  ACT S/A  A  K£Ct/A  =  Tl£CS/A ) 

V  A  €  OBJS  ( ACTs/a  =  0) : 

( ACTt/A  =  ACT  si  A  A  fc£CT/A  -  71£CSiA) 

JC5.  The  set  of  failed  servers  remains  the  same. 

V  A  6  OBJS  :  JAICt/a  =  JAJCs/a 

In  addition  to  meeting  these  conditions,  the  new  log  of  server  /  should  also 
be  as  complete  as  possible.  The  new  log  should  retain  as  many  of  the  old  log  s 
entries  as  possible.  This  allows  the  ACTIVATE  phase  to  recover  inactive  objects 
into  the  most  recent  state  possible.  Although  we  will  not  formalize  this  condition. 
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we  do  wish  to  point  it  out  as  a  goal. 

As  shown  in  the  following  theorem,  the  JOIN  phase  preserves  consistency 
within  a  system. 

Theorem  3.1 

If  S  is  a  state  that  is  observably  consistent  with  a  request  structure  (71.  -<n  ). 
and  if  T  is  a  state  that  solves  the  JOIN  problem  for  server  f  €  SS7ZV  in  state 
S,  then  T  is  also  observably  consistent  with  (71,  -<n). 

Proof:  In  order  to  prove  that  T  is  observably  consistent  with  (7Z,<n)  we 

must  show  three  things.  First,  we  must  show  that  all  servers’  logs  are  consistent 
with  the  request  structure.  From  condition  JC1  of  the  JOIN  phase  definition 
we  know  that  the  log  of  server  /  (in  state  T )  is  consistent  with  (7Z,  <u)-  From 
condition  JC3  we  know  that  the  logs  of  all  other  servers  remain  unchanged  from 
state  S,  in  which  they  were  all  consistent  with  (71,  -<n)  by  premise.  The  logs  of 
all  servers  in  state  T  are  therefore  consistent  with  (11,  -<n)- 

Next,  we  must  show  that  all  active  servers  of  an  object  reflect  the  same  state 
for  the  object.  Let  A  €  OBJS  be  any  active  object  (i.e.  ACT^/a  ^  0).  We 
assume  that  /  is  not  actively  servering  object  A  in  state  5  (i.e.  /  £  ACT  S/A). 
otherwise  it  would  not  need  to  solve  its  JOIN  problem.  By  premise.  S  is  an 
observably  consistent  state  and  so  all  active  servers  of  A  in  5  have  logged  the 
same  object  state. 


\fg€ACTs/A:  (£s/g'~*s/g)  U  =  ASS/a 

Because  /  £  ACT 5/^,  it  follows  from  condition  JC3  that  the  logs  of  all  servers 
in  ACT s/a  remain  unchanged  between  states  5  and  T. 


V  g  €  ACT s/A-  (CT/g'~+T/g)  ~  (CS/g'-+S/g) 

Combining  these  two  equations  we  see  that  all  active  servers  of  A  in  state  5  have 
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still  logged  the  same  object  state  in  state  T . 

Vg€ACTS/A:  i^T/g'  ’T/g)  U  =  ASs/a  (3.1) 

Now,  there  are  two  cases:  either  /  is  a  (recovering)  ser%’er  of  A  or  it  is  not. 
Suppose  /  is  a  server  of  A.  From  condition  JC2  we  know  that 

(^T//'- *T//)  U  =  ASs/a  (3-2) 

Combining  equations  3.1  and  3.2  we  get 

v  9  €  ACTS/a  (J  {/}  :  (^T/g'~*T/g)  U  =  ASs/a  (3-3) 

From  condition  JC4  we  know  that 

ACTt/a  =  ACT SjA  (JW 

Substituting  this  into  equation  3.3  we  get  the  desired  result  that  all  active  servers 
of  A  in  state  T  reflect  the  same  state  for  the  object. 

Vg€  ACTt/a  '  (^T/g'^T/g)  U  =  AS$/a  (^A) 

Now  suppose  that  /  is  not  a  server  of  object  A.  From  condition  JC4  we  know 
that  ACT  if  a  =  ACT  si  a-  Substituting  this  into  equation  3.1  we  see  again  that 
all  servers  of  A  are  consistent. 

V  9  €  ACT T/A  :  i^T/g '  “ 1 ’T/g)  U  =  ASs/a  (3-3) 

The  last  thing  we  must  show  in  order  to  prove  the  observable  consistency 
of  T  is  that  the  states  of  all  active  objects  are  mutually  consistent.  Because  5 
is  am  observably  consistent  state  we  know  that  all  active  objects  are  mutually 
consistent  in  state  5. 


VA,5  €  OBJ S  {ACTs/a  ACTs/B  *  0)  '■ 

V  x.A  €  ASs/a  '■  V  y-B  €  H  (y-B  -<■£  x.A)  :  y.B  €  ASs/b 


(3.6) 
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From  condition  JC4  it  follows  that  any  object  that  is  active  in  state  5  is  also 
active  in  state  T  and  that  there  are  no  new  active  objects  in  state  T . 

V  A  €  OBJS  :  ACTs/a  £  0  »  ACTt/a  £  0 

Substituting  this  into  equation  3.6  we  see  that  ail  active  objects  in  state  T  were 
mutually  consistent  in  state  S. 

V  A,  B  €  OBJS  ( ACTt/a  *  0  A  ACTr/jS  £  0)  :  (3.7) 

V  x.4  €  ASs/a  '•  V  V-B  €  ^  -<ft  s.A)  :  y-5  €  AS$/b 

From  equations  3.4  and  3.5  we  see  that  the  states  of  all  active  objects  remain 
unchanged  between  states  S  and  T. 

V  A  €  OBJS  ( ACTt/a  *  0) :  ASt/a  =  ASj/a 
Substituting  this  into  equation  3.7  we  get  the  desired  result. 

V  A,  B  €  OBJS  ( ACTt,a  *  0  A  -4CTr/fl  ^  0) :  (3  3) 

V  x.4  €  ASt/a  '•  V  y-B  6  7£  (y-£  x-4)  :  y.B  €  ASj/q 

That  is,  the  states  of  all  active  objects  axe  mutually  consistent  in  state  T.  □ 


3.2.2  ACTIVATE  Phase  Outline 

The  ACTIVATE  phase  is  responsible  for  recovering  a  server’s  replicas  of  inactive 
objects.  A  server  does  not  begin  its  ACTIVATE  phase  until  it  has  completed  its 
JOIN  phase.  Inactive  objects  are  recovered  one  at  a  time  and  a  server  coordinates 
its  recovery  of  an  inactive  object  with  those  of  the  other  recovering  servers  of  the 
object  (once  they  have  completed  their  JOIN  phases).  In  order  to  restore  an 
inactive  object,  the  recovering  servers  first  agree  on  a  new  state  for  the  object 
(one  that  is  consistent  with  the  states  of  all  other  active  objects  in  the  system) 
and  then  alter  their  logs  to  reflect  this  new  state.  The  servers  then  restore  their 
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replicas  by  replaying  the  appropriate  portions  of  their  logs  and  begin  to  receive 
and  process  new  client  requests  on  the  object. 

The  changes  that  occur  to  the  system  state  as  a  result  of  the  ACTIVATE 
phase  are  shown  in  definition  3.2.  Note  that  the  only  portion  of  the  state  that 
changes  is  the  portion  related  to  the  recovering  servers  of  the  inactive  object  {A). 
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Definition  3.2 

A  state,  T ,  solves  the  ACTIVATE  problem  for  object  A  £  OBJS  in  state  S 
under  request  structure  {H,~<ti)  if  T  satisfies  the  following  conditions: 

ACl.  The  new  logs  of  the  recovering  servers,  {Cj-/f'  ^  /  €  TZSCg/ 

are  consistent  with  {'ll,  An). 

AC2.  The  recovering  servers  of  A  agree  on  the  object’s  new  state. 

V  f,9  ZUZCs/A  '■  ~*T/f)  \a  =  i^T/g'  ~*T/g)  U 

AC3.  The  new  state  for  object  A  is  consistent  with  the  states  of  all  other 
active  objects. 

Vfle  OBJS  {ACTt/b  £  0) : 

V  x.A  €  ASt/a  '•  V  V-B  €  ft  ( y-B  -<n  x.A )  :  y.B  £  ASj/b  and 

V  y.B  £  ASj/b  '■  ^  X‘A  6  Tl  { x.A  An  y.B)  :  x.A  £  ASjja 

AC4.  The  new  logs  of  the  recovering  servers  preserve  the  states  of  any  pre¬ 
viously  active  objects. 

V  /  €  HSC s/ A  :  V  B  £  OBJS f  (/  £  ACTS/B)  : 

( ^T/f  ’  ~~*T/f )  =  (CS/f’~*S/f) 

A;"5.  The  only  logs  affected  are  those  of  the  recovering  servers  of  A. 

V/  €  SSHV  -  TISCS/a  :  {CT/f,-*T/f)  =  {CS/j,— >S/j) 

AC6.  The  recovering  servers  of  A  become  active  servers  of  the  object. 

ACT t,a  =  HSCs/A  V  B  £  OBJS  -  [A]  :  ACTt/b  =  ACTS/B 
HSCt/a  =  0  V  fl  6  OBJS  —  {A}  :  HECjjQ  =  7 1£C$/q 

AC7.  The  set  of  failed  servers  remains  the  same. 

V  A  £  OBJS  :  JAICt/a  =  ?AICs/a 


In  addition  to  meeting  these  conditions,  the  recovering  servers’  new  logs 
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should  also  be  as  complete  as  possible,  reflecting  as  many  of  the  previously 
logged  requests  as  possible.  In  addition,  the  new  state  constructed  for  object 
.4  should  be  as  up  to  date  as  possible.  The  state  should  reflect  all  of  the  logged 
requests  from  the  time  of  recovery  that  are  consistent  with  the  current  system 
state.  Again,  however,  we  will  not  formalize  these  conditions.  We  present  them 
only  as  design  goals. 

The  following  theorem  shows  that  the  ACTIVATE  phase  preserves  consis¬ 
tency  within  a  system. 

Theorem  3.2 

If  S  is  a  state  that  is  observably  consistent  with  a  request  structure  ( 71 ,  -<r), 
and  if  T  is  a  state  that  solves  the  ACTIVATE  problem  for  object  .4  6  OBJS 
in  state  S,  then  T  is  also  observably  consistent  with  (71,  -<#). 

Proof:  A  state  is  observably  consistent  with  a  request  structure  if  it  has 

three  properties.  First,  the  logs  of  all  servers  in  the  new  state  must  be  consistent 
with  (71,  -<n).  From  condition  ACl  of  the  ACTIVATE  phase  definition  we  know 
that  the  logs  of  all  recovering  servers  of  object  A,  in  state  T.  are  consistent 
with  (71,  -<■%).  From  condition  AC5  we  know  that  the  logs  of  all  other  servers 
remain  unchanged  from  state  S,  in  which  they  were  consistent  with  (7 Z,<r) 
by  premise.  The  logs  of  all  servers  in  state  T  are  therefore  consistent  with  the 
request  structure. 

Next,  in  order  for  a  state  to  be  observably  consistent,  all  active  servers  of  an 
object  must  reflect  (have  logged)  the  same  object  state.  To  see  that  this  property 
holds  in  state  T ,  first  consider  object  A.  From  condition  AC6  we  know  that  the 
only  active  servers  of  object  A  in  state  T  are  the  servers  that  were  recovering  in 
state  S. 

ACTt/a  =  71£CS/a 


From  condition  AC2  we  know  that  these  servers  reflect  the  same  object  state  for 
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A  in  state  T. 

Vf,9tH£Cs/A:  ~*T/f)  U  =  i^T/g'  *Tjg )  U 

Now,  consider  any  other  active  object  B  (ACTj'/g  ^  0)  in  state  7\  It  follows 
from  condition  AC6  that  the  set  of  active  servers  of  B  remains  unchanged  between 
states  5  and  T. 

Vfl  €  OSjTS-{A}  (ACTt/b  ^  0)  :  -4CT5/5  =  *4CTr/5  (3.9) 

Because  S  was  an  observably  consistent  state,  it  follows  that  all  of  these  servers 
reflected  the  same  object  state  for  B  in  state  5. 

V  f,g  6  ACTt/b  •  (£s/f'~*s/f)  \b  =  (£s/g’-*s/g)  \b  (3.10) 

From  condition  AC4  we  know  that  the  set  of  logged  requests  for  object  B  does 
not  change  between  states  S  and  T  at  any  of  the  active  servers  of  B  that  are 
recovering  servers  of  A. 

^  feACTT/Bf]Tl£CS/A-.  >jjj)  \b  =  (^s//>  ~*S/f)  l-S  (311) 

From  condition  AC5  we  know  that  the  logs  of  the  other  active  servers  of  B  (those 
that  are  not  recovering  servers  of  A)  do  not  change  between  states  5  and  T  and 
so  the  set  of  quests  they’ve  logged  for  B  remains  the  same. 

V  /  6  ACT  t/B  -R£Cs/  a-  \b  =  (^s//>  ~ * 's/f)  \b  (3.12) 

Combining  equations  3.11  and  3.12  we  see  that  all  active  servers  of  B  have  logged 
the  same  set  of  requests  for  B  in  both  states  5  and  T . 

V  /  €  ACTt/b  :  (CT/f,  — < >Tjj)  \b  =  (^5//>  ~*S/f)  \b  (313) 

Substituting  the  result  of  equation  3.13  into  equation  3.10  we  see  that  all  active 
servers  of  B  reflect  the  same  object  state  in  state  T. 


V  f,g  G  ACTT/b  ■  ~*T/f)  \B  ~  ^T/g'~*T/g) 
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The  last  property  of  observable  consistency  is  that  the  states  of  all  active 
objects  are  mutually  consistent.  To  see  that  this  property  holds  in  the  new  state. 
T,  consider  first  any  two  active  objects,  B,C  6  OB  JS  -  {.4},  other  than  .4 
( ACT i iB  £  0  and  ACTj/q  ^  0).  From  equation  3.9  we  know  that  the  set  of 
active  servers  of  these  objects  does  not  change  between  states  5  and  T. 

ACT  sib  -  ACTt/b  ACT  s/c  —  ACTx/c 

Because  5  was  an  observably  consistent  state,  we  also  know  that  the  active  states 
of  these  objects  were  mutually  consistent  in  state  S. 

V  B,C  6  OBJS  -  {A}  ( ACTt/b  *  0  A  ACTt/c  *  0)  : 

(3.14) 

V  y.B  6  A$S/b  ■  V  z.C  G  71  ( z.C  -<*  y.B )  :  z.C  6  ASs/c 
From  equation  3.13  we  know  that  the  states  of  these  active  objects  do  not  change 
between  states  5  and  T. 

Vfl  €  OBJS- [A]  (ACTt/b^<H):  ASt/b=ASs/b  (3.15) 

They  must  therefore  remain  mutually  consistent  in  state  T. 

V  B,  C  G  OBJS  -  {A}  ( ACTt/b  ±  0  A  ACTt/c  *  0) : 

V  y.B  €  ASj/B  :  V  z.C  €  H  (z.C  ■<%  y.B)  :  z.C  £  AS^/c 
It  follows  that  any  inconsistency  between  object  states  in  T  must  involve  object 
A.  However,  from  condition  AC3  we  know  that  the  active  state  of  A  is  consistent 
with  the  active  states  of  all  other  objects.  The  states  of  all  active  objects  are 
therefore  mutually  consistent  in  state  T.  □ 

3.3  Recovery  Examples 

As  an  example  of  the  recovery  mechanism’s  behavior,  consider  again  the  execu¬ 
tion  of  the  resource  system  shown  in  figure  3.2.  Suppose  that  server  /  is  the  first 
server  to  recover  after  time  <3.  At  the  time  server  /  recovers,  the  state  of  the 
system  will  be: 
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ACT 


ACT 


S/Name*  ~  ® 
=  0 


S/Allocation* 


'R-£Cs/Namea  =  {/}  TATCgj  N  ame  9  =  {5} 

'R'CC S/ Allocation*  ®  TATCg/ Allocations  =  {5'^} 


reg2 

re9l 

( CS/g’->S/g): 

( ^S/h >  ~*s/0' 

alc2 

ale? 

Because  no  objects  are  active  when  /  recovers,  the  JOIN  phase  of  /  will  not 
take  any  actions.  During  its  ACTIVATE  phase,  however,  server  /  will  recover 
its  replica  of  object  “Names”.  Because  no  objects  are  active,  server  /  is  free 
to  recover  any  valid  state  of  “Names”  for  its  replica;  it  does  not  have  to  be 
concerned  with  ensuring  consistency  with  the  states  of  any  other  active  objects. 
Server  /  therefore  recovers  its  replica  using  the  state  reflected  in  its  log  (the  state 
reflecting  only  the  registration  of  client  1).  The  resulting  state  is  shown  below: 


ACT  gj Name*  =  {/}  R£CS/Name»  =  0  ^ATCg/Name*  ~  {q} 

ACT 5/ Allocation*  ~  ®  ^CC  gf  Allocation*  =  ®  AT Cg/ Allocations  = 


(c 


s/r^s/j 


): 


regi 

regi 

(CS/g^S/g)' 

(C-s/h'  1 

alc2 

alc2 

Now,  suppose  that  server  h  is  the  next  server  to  recover.  Again,  no  objects 
served  by  h  are  active  at  the  time  of  the  recovery  and  so  the  server’s  JOIN  phase 
will  not  take  any  actions.  Instead,  server  h’s  replica  of  “Allocations”  is  recovered 
during  its  ACTIVATE  phase.  Unlike  the  recovery  of  object  “Names"  by  server 
/,  however,  server  g  is  not  free  to  recover  any  state  for  object  “Allocations”;  it 
must  ensure  that  the  state  recovered  is  one  that  is  consistent  with  the  state  of 
the  now  active  object  “Names”.  Server  h  must  therefore  delete  request  alc2  from 
its  log  because  the  registration  of  client  2  is  not  reflected  in  the  active  state  of 
the  system.  The  state  of  the  system  resulting  from  the  recovery  of  h  will  then 
be: 


ACT  Sj  Names  =  {/}  ^EC  S/ Names  =  ®  ^^S/.Vamej  =  {#} 

•ACT s/ Allocations  {^}  T^EC S/ Allocations  ~  ®  T AT C-S/ Allocations  ~  {?} 


(£ 


s/f'  s/f 


)■■ 


reg2 

regi 

(CS/g'-*S/gy- 

(t-s/h'~*s/hy- 

0 

alc2 

If  server  g  then  recovers  last,  both  objects  it  servers  will  be  active.  The  states 
of  these  objects  are  therefore  transferred  to  g  during  its  JOIN  phase  and  placed 
in  its  log.  No  actions  are  taken  during  g's  ACTIVATE  phase.  The  final  state  of 
the  system  (after  the  recovery  of  all  three  servers)  is  shown  below: 


ACT 5/ Names  ~  H£C 5/ Names  ~  ®  T  AX  C^/ Names  ~  ® 

ACT  5/  Allocations  =  ^EC  $/  Allocations  =  ®  ^ AX  C  $/  Allocations  —  ® 


(C 


S/f'  *  s/f 


regi 

(CS/g'-*S/gy 

regy 

Vs/h^S/ky 

0 

As  another  example,  suppose  that  server  /  recovers  first  as  above,  but  that 
servers  g  and  h  then  recover  simultaneously.  Again,  the  JOIN  phase  of  h  will  not 
take  any  actions  because  the  object  served  by  h  (“Allocations”)  is  inactive  at  the 
time  of  the  recovery.  Because  object  “Names”  is  active,  though,  the  JOIN  phase 
of  g  will  recover  g's  replica  of  that  object.  In  order  to  restore  the  replica  to  the 
object’s  current  active  state,  the  JOIN  phase  of  g  adds  request  reg \  to  g's  log  and 
deletes  request  reg 3.  Note,  however,  that  in  order  to  preserve  consistency  within 
the  log  of  request  alc2  must  also  be  deleted  because  it  depends  on  request  reg2- 
The  state  of  the  system  immediately  after  the  JOIN  phases  of  servers  g  and  h 
will  then  be: 


ACT S/ Names  —  {/>#}  *R£C 5/ Names  ~  ®  T  AX  Cs/ Names  —  ® 

ACT S/ Allocations  ~  ®  H£C §/ Allocations  ~  ^ AX C§/ Allocations  ~  ® 
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After  completing  their  JOIN  phases,  servers  g  and  h  begin  their  ACTIVATE 
phases.  During  their  ACTIVATE  phases,  the  servers  recover  their  replicas  of 
object  “Allocations”.  The  servers  cooperate  in  deciding  on  a  new  state  for  the 
object.  Because  the  only  request  on  the  object  known  to  either  server  {ale 2)  is 
inconsistent  with  the  active  state  of  “Names”,  the  servers  will  decide  on  a  state 
that  reflects  no  allocation  of  resources.  The  final  system  state  is  the  same  as  that 
in  the  previous  example. 

As  a  final  example,  suppose  that  server  h  is  the  first  server  to  recover.  No 
objects  will  be  active  at  the  time  of  the  recovery,  so  no  actions  will  be  taken 
during  the  JOIN  phase  of  h.  During  its  ACTIVATE  phase,  though,  server  h 
will  recover  its  replica  of  “Allocations”  in  the  state  reflected  by  its  log  (the  state 
reflecting  the  allocation  made  to  client  2). 

Suppose  now  that  servers  /  and  g  simultaneously  recover.  The  state  of  the 
system  at  the  time  of  the  servers  recovery  will  then  be: 


ACT  $/tfamet  —  0  ^£Cs/Namei  ~  {fyd)  TAX £5/ Names  ~  ® 
ACT $ / Allocation*  =  {M  ^^'S/.A/Zoeatiow*  =  TAXC^/Allocationt  ~  ® 


During  its  JOIN  phase,  server  g  will  recover  its  replica  of  “Allocations”.  Because 
its  log  already  reflects  the  current  state  of  that  object,  no  alterations  are  made 
to  the  log.  No  actions  are  taken  during  the  JOIN  phase  of  server  /. 

When  servers  /  and  g  enter  their  ACTIVATE  phases,  they  recover  their  repli¬ 
cas  of  object  “Names”.  The  servers  merge  their  logs  to  form  a  new  state  for  the 
object  that  reflects  both  the  registrations  of  client  1  and  client  2.  Server  /  alters 


41 


its  log  to  reflect  this  new  state  by  adding  in  request  reg 2.  Server  g  similarly  alters 
its  log  by  adding  in  request  reg\.  The  resulting  system  state  is  then: 

■ACTS/Namea  =  {/,y}  fcSCs/Namet  =$  ? Names  =  ® 

S/ Allocations  {§i  5/ Allocations  ~  ®  ^ C 5/ Allocations  ~  ® 


(t-S/h^S/h)- 


Note  that  request  rtgi  must  be  included  in  the  new  state  of  ‘‘Names”  because 
the  active  state  of  “Allocations”  depends  on  it. 

3.4  Summary 

In  this  chapter  we  examined  the  problem  of  how  inconsistencies  arise  between  the 
states  of  objects  in  a  system.  Inconsistencies  can  develop  in  two  ways.  First,  in¬ 
consistencies  develop  between  replicas  of  the  same  object  when  recovering  servers 
fail  to  restore  the  states  of  their  replicas  to  those  held  by  other  servers  in  the 
system.  Second,  inconsistencies  can  occur  between  the  states  of  different  objects 
when  recovering  servers  restore  old  and  out  of  date  object  states. 

A  recovery  algorithm  was  outlined  for  preventing  these  inconsistencies  when 
a  server  fails.  The  algorithm  was  divided  into  two  phases  based  on  the  two  types 
inconsistencies  that  occur  between  objects  and  replicas. 

JOIN  Restore  a  server’s  replicas  of  active  objects  to  the  current 

phase 

active  states  of  those  objects. 

ACTIVATE  Restore  a  server’s  replicas  of  inactive  objects  to  states  that  are 
phase 

consistent  with  the  states  of  all  active  objects  in  the  system. 
This  phase  had  the  additional  property  that  all  recovering 


alc2 


(C 


reg\ 


rtgi 


(CS/g^S/g): 


regi 


reg  1 


ale? 
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servers  of  an  inactive  object  agreed  on  the  state  restored  for 
that  object. 

The  behaviors  of  the  recovery  phases  were  formally  described  and  it  was  proved 
that  these  behaviors  preserve  consistency  within  a  system. 

The  chapter  concluded  with  several  examples  of  how  the  recovery  mechanism 
restores  consistent  states  to  servers’  object  replicas. 


Chapter  4 


Log  Transformations 

The  main  difficulty  involved  in  implementing  the  recovery  phases  of  the  previous 
chapter  is  ensuring  that  the  alterations  that  occur  to  servers’  logs  preserve  the 
consistency  of  those  logs.  This  chapter  presents  functions  for  adding  and  deleting 
requests  from  a  server’s  log  in  a  way  that  preserves  the  log’s  consistency.  These 
functions  (or  transformations)  will  form  the  basis  of  our  recovery  algorithms. 

4.1  Log  Addition 

In  order  to  bring  a  recovering  server’s  log  into  a  state  that  is  consistent  with  the 
rest  of  the  system,  it  is  sometimes  necessary  to  add  requests  to  the  log.  Such 
added  requests  axe  generally  requests  that  the  server  missed  receiving  because 
of  its  failure.  For  example,  consider  the  execution  shown  in  figure  4.1.  In  this 
execution,  servers  /  and  g  fail  after  receiving  the  registration  of  client  1  but 
before  receiving  the  registration  of  client  2.  Server  h  remains  active  throughout 
the  execution  and  receives  the  allocation  request  (alc-i)  from  client  2.  This  request 
is  not  received  by  server  g ,  however,  because  g  fails  before  its  delivery.  If  server 
g  recovers  at  time  #2,  it  will  have  to  add  this  request  to  its  log  so  that  the  log 
reflects  the  current  state  of  “Allocations”  (i.e.  the  state  reflected  in  the  log  of 
server  h). 
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1  reg\  \ 


]  regi  \ 


1 

!  re92 

ale 2  //  ! 

•  Client  2 

jZ  : 

Figure  4.1:  A  recovery  requiring  addition  to  a  log 
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add  <?(£/,->/)  =  (£,-*•£) 
where 


£  =  £/  U  «  U  I  U  U  V£T>b{x.a)  \ 

z.AzQ  BeOBJS, 

—*C  is  any  extension  of  — ►/  consistent  with  -<ft. 
Figure  4.2:  Log  addition  preserving  consistency 


The  addition  of  requests  to  a  server’s  log  can  cause  the  log  to  become  incon¬ 
sistent,  however.  In  the  above  example,  the  log  of  server  g  becomes  inconsistent 
when  request  ale 2  is  added  because  the  client  registration  on  which  alc^  depends 
(re<72)  is  missing  from  the  log.  In  order  to  preserve  consistency  within  a  log.  any 
dependents  of  an  added  request  must  also  be  added  to  the  log  (unless  they  are 
already  present). 

Definition  4.1 

The  set  of  object  B  dependents  of  request  x.A  are 

T>SVb(x.A)  =  { y.B  €  'll  |  y.B  x.A } 

Shown  below  is  the  complete  sequence  of  changes  required  to  consistently  add 
request  alc2  to  the  log  of  server  g : 


regi 

reg\ 

rtg\ 

reg2 

alc2 

alc2 

Figure  4.2  presents  a  function  for  adding  a  set  of  requests,  Q  C  H,  to  the 
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log  of  a  server,  /  £  S€TZV.  As  shown  in  the  following  theorem,  this  function  * 
preserves  the  consistency  of  the  log. 

« 

Theorem  4.1 

If  (Hfi  ~*f)  w  a  log  for  server  f  consistent  with  request  structure  (7 Z.  -O/jj, 
and  if  Q  C  H  is  a  set  of  requests  on  objects  served  by  f,  then  addp(£^.  — >,) 
is  aho  consistent  with  (7 Z,  -<^). 

Proof:  Let  =  add(j(£/,  —*/)•  We  first  show  that  (£,—►£)  only 

contains  requests  on  objects  served  by  /.  By  premise,  (£^,— *j)  is  consistent 
and  so  only  contains  requests  on  objects  served  by  /.  The  only  requests  added 
to  this  log  by  the  function  are  those  in  Q  and  its  dependents.  By  premise,  all 
of  the  requests  in  Q  are  on  objects  served  by  /.  From  the  definition  of  the  log 
addition  function,  the  only  dependent  requests  added  to  the  log  are  those  on 
objects  served  by  /.  All  of  the  requests  added  to  the  log  are  therefore  on  objects 
served  by  f. 

We  now  show  that,  for  any  request  in  (£,—►£),  all  of  its  dependents  (on 
objects  served  by  /)  are  also  in  (£,  —*c)-  Let  x.A  €  £  be  any  request  in  the  new 
log.  There  are  three  cases: 

Case  1:  x.A  6  Cj 

By  premise,  (£^,— * j)  is  consistent  with  and  so  all  dependents  of 

x.A  (on  objects  served  by  /)  are  in  Cj.  Because  (£,  —*c)  *s  formed  by  adding 
requests  to  (£^,  —*j),  it  follows  that  these  dependents  remain  in  (£,  —*c)- 

Case  2:  x.A  €  Q 

It  follows  immediately  from  the  definition  of  the  log  addition  function  that 
all  of  the  dependents  of  x.A  (on  objects  served  by  /)  are  added  to  (£.—»£). 


Case  3:  x.A&Cj  A  x.A  g  Q 
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Request  x.A  must  have  been  added  to  (£,  —*c)  because  it  is  a  dependent  of 
some  request,  y.B,  in  Q. 

x.A  -<n  y.B  (4.1) 

Let  z.C  £  71  be  any  dependent  of  request  x.A  made  on  an  object  served  by 

/  (C  £  OBJSf). 

z.C  x.A  (4.2) 

Because  is  transitive,  it  follows  from  equations  4.1  and  4.2  that  request 
y.B  is  also  dependent  on  z.C. 


z-C  -<n  y.B 

From  the  definition  of  the  log  addition  function  it  follows  immediately  then 
that  request  z.C  is  added  to  (£,—►£). 

The  last  thing  we  must  show  is  that  the  order  of  requests  in  (£,  —*c)  *s  con¬ 
sistent  with  -<£.  However,  this  follows  immediately  from  the  definition  of  the  log 
addition  function.  □ 

4.2  Log  Deletion 

In  addition  to  adding  requests  to  its  log,  a  recovering  server  may  also  need  to 
delete  requests  from  its  log  in  order  to  bring  it  into  consistency  with  the  rest  of 
the  system.  Such  deleted  requests  are  generally  requests  that  were  not  recovered 
as  part  of  their  object’s  states  by  previously  recovering  servers  of  the  objects.  For 
example,  consider  the  execution  shown  in  figure  4.3.  Suppose  server  /  recovers 
first  and  restores  its  replica  of  “Names”  from  its  log.  The  state  of  “Names” 
will  then  only  reflect  the  registration  of  client  1  {reg\);  it  will  not  reflect  the 
registration  of  client  2  (re^).  If  server  g  recovers  next,  it  will  have  to  delete 
request  rtgi  from  its  log  in  order  to  bring  it  into  consistency  with  /. 
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time  0  time  t\  time  ti 


Figure  4.3:  A  recovery  requiring  deletion  from  a  log 
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deleteQ(£/,^/)  =  (£.-*£) 
where 

£  =  {  x.A  €  Cf  |  x.A  g  Q  A  ft  y.B  €  Q  :  y.B  -<n  x.A  } 

V  x.A, y.B  6  £  :  (x.A  -*c  y.B)  O  (x.A  -+/  y.B) 

Figure  4.4:  Log  deletion  preserving  consistency 

Like  the  addition  of  requests,  the  deletion  of  requests  can  cause  a  server's  log 
to  become  inconsistent.  In  the  previous  example,  the  log  of  server  g  becomes 
inconsistent  when  request  regi  is  deleted  because  the  allocation  that  depends  on 
it  ( a/c2 )  is  still  present  in  the  log.  In  order  to  preserve  consistency  within  a  log, 
any  requests  that  depend  on  a  deleted  request  must  also  be  removed  from  the 
log.  Illustrated  below  is  the  complete  sequence  of  changes  required  to  remove 
request  regz  from  the  log  of  server  g: 


Figure  4.4  presents  a  function  for  deleting  a  set  of  requests,  Q ,  from  the  log 
of  a  server,  /.  As  shown  in  the  following  theorem,  this  function  preserves  the 
consistency  of  the  log. 

Theorem  4.2 

is  a  log  for  server  f  consistent  with  request  structure 
and  if  Q  C  Cj  is  a  subset  of  the  requests  in  (£y,  ~*j),  then  deleteg(£y,  — y) 
is  also  consistent  with 


Proof:  Let  ( C,—c )  =  delet e<j(£y,-+y).  We  first  show  that  (£,—£)  only 
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contains  requests  on  objects  served  by  /.  From  the  definition  of  the  log  deletion 
function,  the  requests  in  (£,—+£)  are  a  subset  of  the  requests  in  By 

premise,  (£^,  — ►  is  consistent  and  so  these  requests  must  all  be  on  objects  served 
by  /. 

We  now  show  that,  for  any  request  in  (£,—►£),  ail  of  its  dependents  (on 
objects  served  by  /)  are  also  in  (£,—►£).  The  proof  is  by  contradiction.  Let  x.A 
be  any  request  in  (£,  — +£).  Suppose  some  dependent  of  x.A  (made  on  an  object 
served  by  /)  is  missing  from  (£,  —*c).  Let  y.B  denote  this  dependent. 

y. B -<n  x.A  (4.3) 

From  above,  we  know  that  £  C  Cj  and  so  request  x.Aisin(£y,— ^).  Because 
(£jr,— is  consistent,  it  follows  that  request  y.B  is  also  in  (£^,— >j).  Request 
y.B  must  therefore  have  been  removed  from  the  log  by  the  log  deletion  function 
when  forming  (£,  — »£)•  This  could  have  happened  for  one  of  two  reasons:  either 
it  was  in  Q  or  it  was  dependent  on  a  request  in  Q. 

If  request  y.B  were  in  Q,  then  request  x.A  would  also  have  been  removed 
from  the  log  by  the  transformation  because  it  depends  on  y.B  (a  request  in  Q ). 
a  contradiction.  Request  y.B  must  therefore  have  been  removed  from  the  log 
because  it  depends  on  some  request,  z.C,  in  Q. 

z. C  y.B  (4.4) 

Because  -<■%,  is  transitive,  it  follows  from  equations  4.3  and  4.4  that  request  x.A 
is  also  dependent  on  z.C. 

z.C  - <n  x.A 

Request  x.A  should  therefore  have  been  removed  from  the  log  because  it  depends 
on  a  request  in  Q,  another  contradiction.  The  new  log,  (£,—►£),  must  therefore 
contain  y.B. 

The  last  thing  we  must  show  is  that  the  order  of  requests  in  (£,  —>c)  Is  consis¬ 
tent  with  ('ll,  -<r).  From  the  definition  of  the  log  deletion  function,  the  requests 


in  (C,—+c)  are  ordered  the  same  way  they  were  in  Because  (£^,— r,) 

is  consistent  with  (H,  -<£),  it  follows  that  this  order  is  consistent  with  (7 Z.<n).  C 

4.3  Using  Dependency  Estimates 

The  previous  log  transformations  were  both  based  on  having  explicit  knowledge 
of  the  dependencies  between  requests.  Such  information  is  not  available  in  all 
systems,  however.  When  the  exact  set  of  clients  is  either  unknown  to  the  servers, 
or  is  large  and  dynamically  changing,  it  can  be  difficult  or  impossible  to  maintain 
explicit  dependency  information.  When  this  information  is  not  available  to  the 
servers,  the  preceding  transformations  cannot  be  used. 

This  section  examines  how  the  log  transformations  can  be  modified  to  use 
estimates  of  the  true  dependencies.  The  key  to  the  success  of  these  new  trans¬ 
formations  will  be  the  use  of  estimates  that  never  under-estimate  the  true  set 
of  the  dependencies  in  the  system.  We  refer  to  estimate  that  have  this  property 
as  sound  estimates.  By  using  sound  estimates,  the  transformations  will  enforce 
some  extraneous  orderings  because  of  the  inaccuracy  of  the  estimates,  but  they 
will  also  enforce  all  true  dependencies.  The  actual  estimates  used  in  the  new 
transformations  are  presented  later  in  chapter  6. 

4.3.1  Log  Addition 

Consider  first  the  problem  of  adding  a  set  of  requests  to  a  server’s  log.  Let 
DEV  B{x.A)  denote  any  sound  estimate  of  the  set  object  B  dependents  of 
request  x.A. 

VSVb(x.A)  C  VTPB{x.A)  (4.5) 

We  would  like  to  modify  the  log  addition  transformation,  add<j(£^,  — *j),  to 
use  15 TPb(x-A)  instead  of  the  true  dependency  set  DEVb(x.A).  Unfortunately,  as 
we  show  below,  the  estimate  cannot  be  used  directly  in  place  of  'DE'P #(x.A).  The 
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reason  for  this  is  that  the  log  addition  transformation  uses  the  transitive  property 
of  causal  dependencies  in  order  to  preserve  consistency  within  a  server's  log. 

z-c  -<n  y-R  A  y-B  X-A  z=*  z-c  xA 

The  estimate  does  not  have  this  transitive  property. 

z.c  eT5FPc(y-B)  /\  y.B  eT5FPB(xA)  j>  z.ceWvc(x.A) 

It  may  seem  counter-intuitive  that  an  estimate  would  not  have  the  transitive 
property.  However,  in  the  estimates  we  describe  later,  an  estimate  may  be  able 
to  find  evidence  contradicting  a  dependency  such  as  z.C  — ►  x.A  without  finding 
evidence  to  contradict  either  of  the  dependencies  z.C  — ►  y.B  or  y.B  — ►  x.A. 
The  estimate  can  then  determine  that  it  is  not  the  case  that  both  z.C  — *  y.B 
and  y.B  — >  x.A  hold.  But,  it  cannot  determine  which  one,  if  any,  is  the  real 
dependency. 

To  illustrate  how  this  creates  problems  in  the  log  addition  transformation, 
consider  the  transformation  addg(£^,  — +j).  Let  x.A  be  any  of  the  requests  in 
Q  added  to  In  order  to  preserve  consistency  in  the  log,  the  addition 

transformation  explicitly  adds  each  dependent  of  x.A  to  the  log.  For  each  of  these 
dependents,  y.B ,  the  addition  transformation  also  automatically  adds  each  of  its 
dependents  to  the  log  because,  by  the  transitivity  of  the  request  dependency 
relation,  each  of  these  dependents  is  also  a  dependent  of  x.A.  Thus,  for  each 
request  added  to  the  log,  all  of  its  dependents  are  also  assured  of  being  added  to 
the  log. 

However,  if  an  estimate  is  used,  some  dependents  of  added  requests  may  be 
omitted  from  the  log.  If  request  y.B  is  added  to  the  log  because  it  is  an  estimated 
dependent  of  x.A  (it  might  not  be  a  real  dependent),  then  the  transformation 
should  also  add  to  the  log  all  estimated  dependents  of  y.B ,  in  order  to  preserve 
the  consistency  of  the  log.  From  the  definition  of  the  transformation,  though, 
only  estimated  dependents  of  x.A  would  be  added  to  the  log.  It  is  possible  that 
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1.  R  =  0 

2.  £(°>  =  Cf  U  Q 

3.  NEWREQSW  =  £(°)  -  Cf 

4.  while  NEWREQSW  ^  0 

4.1  R  =  R+  1 

4.2  CW  =  U 

(  u  u  mvB(x.A)  \ 

BZOBJSj  x.A£NEWREQS(R -0 

4.3  NEWREQSW  =  £(*)_£(«-») 

Figure  4.5:  Iterative  addition  of  requests 


some  of  the  estimated  dependents  of  y.B  may  not  be  estimated  dependents  of 
x.A.  These  extra  estimated  dependents  would  be  omitted  from  the  log,  creating 
an  inconsistency. 

In  order  to  use  the  dependency  set  estimate,  the  log  addition  transformation, 
must  add  requests  to  a  log  iteratively.  In  each  round  of  the  iteration,  the  trans¬ 
formation  adds  to  the  log  the  estimated  dependents  of  the  requests  added  in  the 
previous  round.  An  algorithm  for  determining  the  complete  set  of  requests  in 
the  transformed  log  using  this  addition  scheme  is  shown  in  figure  4.5.  In  the 
algorithm,  R  is  the  round  number,  NEWREQS ^  is  the  set  of  new  requests 
added  to  the  log  in  round  R,  and  is  the  complete  set  of  requests  contained 
in  the  log  after  round  R. 

The  complete  log  addition  transformation  using  this  algorithm  is  presented 
in  figure  4.6.  As  shown  in  the  following  theorem,  this  transformation  preserves 
the  consistency  of  a  log. 
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add  Q(Cf,-*f)  =  (C,-*c) 
where 

£  =  R *  =  MIN {  R  |  £<*>  =  £(*+1)  } 

—*C  is  any  extension  of  — ♦  f  consistent  with  T>£V b(x-A). 

Figure  4.6:  Log  addition  using  estimates 


Theorem  4.3 

If  j)  is  a  log  for  server  f  consistent  with  request  structure 

and  if  Q  C  TZ  is  a  set  of  requests  on  objects  served  by  f,  then  add q{Cj.  —>  j) 

is  also  consistent  with  (7£,  -<%). 

Proof:  Let  =  add(j(£^,  — >j).  We  first  show  that  (£,—»£)  only 

contains  requests  on  objects  served  by  /.  By  premise,  both  (£^,  —* j)  and  Q  only 
contain  requests  on  objects  served  by  /.  It  thus  follows  immediately  that  £(01 
only  contains  requests  on  objects  served  by  /.  In  each  round  of  the  addition 
iteration,  only  requests  on  objects  served  by  /  are  added  to  the  log.  It  therefore 
follows  by  induction  that  each  £^  only  contains  requests  on  objects  served  by 
/• 

We  now  show  that,  for  any  request  in  (£,—♦£),  all  of  its  dependents  (on 
objects  served  by  /)  are  also  in  (£,—»£).  Let  x.A  €  £  be  any  request  in  the  new 
log.  There  are  two  cases: 

Case  1:  x.A  6  £; 

By  premise,  (£^,  —*  j)  is  consistent  with  (1Z,  -<n )  and  so  all  of  the  dependents 
of  x.A  (on  objects  served  by  /)  are  in  Cj.  Because  (£,—*c)  is  formed  by 
adding  requests  to  log  it  follows  that  the  dependents  remain  in 
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(£,-*)• 

^ase  x.A  €  SEW1  REQS^^  ( i.e .  x.A  was  added  in  round  R) 

From  the  definition  of  the  iterative  addition  algorithm,  all  of  the  dependents 
of  x.A  (on  objects  served  by  /)  are  added  to  the  log  in  round  R  +  1. 

The  last  thing  we  must  show  is  that  the  order  of  requests  in  (£,  — >£)  is  con¬ 
sistent  with  -<*.  By  definition,  ->c  is  consistent  with  T>FPb(x.a).  From  prop¬ 
erty  4.5  of  the  estimate,  it  follows  that  if  two  requests,  x.A, y.B  €  £,  are  related 
{y-B  - <n  x.A)  then  y.B  6  V£P b(x.A)  and  so  these  requests  are  properly  ordered 
in  (£,-<;).  □ 

4.3.2  Log  Deletion 

Consider  now  the  problem  of  deleting  a  set  of  requests  from  a  log.  We  would  like 
to  modify  the  log  deletion  transformation  to  use  an  estimate  of  the  relationship 
between  requests.  Let  COJ\f(x.A  -<  y.B )  denote  any  such  sound  estimate. 

V  x.A,  y.B  €  ft  :  £OAf{x.A  ■<  y.B)  ==►  x.A  fa  y.B  (4.6) 

Note  that  COW{x.A  -<  y.B)  estimates  the  predicate  that  two  requests  are  unre¬ 
lated. 

As  with  the  log  addition  transformation,  this  estimate  cannot  be  used  directly 
in  the  log  deletion  transformation.  If  it  were,  inconsistencies  could  occur  in  the 
transformed  logs  because  the  transformation  may  fail  to  remove  all  requests  that 
depend  on  the  deleted  requests.  In  order  to  use  the  estimate,  the  log  deletion 
transformation  must  iteratively  delete  requests  from  a  log.  An  algorithm  for 
doing  this  is  shown  in  figure  4.7.  In  the  algorithm  R  is  the  round  number. 
DELETED is  the  set  of  requests  deleted  from  the  log  in  round  R ,  and  £(fl) 
is  the  set  of  requests  contained  in  the  log  after  round  R. 
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1.  R  =  0 

2.  £<°>  =  Cf-Q 

3.  DELETED <°>  =  £,  -  £(°) 

4.  while  DELETED W  ^  0 

4.1  i2  =  fl  +  1 

4.2  £<*>  =  {  y.B  6  £(*-!)  | 

V  i. a  €  DELETED :  COV'(j.a  -<  y.fi)  } 

4.3  DELETED W  =  £(*-!)  -  £(*) 

Figure  4.7:  Iterative  deletion  of  requests 


deleteQ(£^,  — ►  ^  )  =  (£,-£) 
where 

£  =  £(*'>  =  MIN {  R  I  £(/J)  =  £(*+I)  } 

V  x.A,y.B  €  £  :  (x.A  -*c  V-B)  O  (i.A  -+/  y.B ) 

Figure  4.8:  Log  deletion  using  estimates 


The  complete  log  deletion  transformation  using  this  algorithm  is  presented  in 
figure  4.8.  As  shown  in  the  following  theorem,  this  transformation  preserves  the 
consistency  of  a  log. 

Theorem  4.4 

If  (£ f)  is  a  log  for  server  f  consistent  with  request  structure  {TZ.  ~<n). 
and  if  Q  C  Cj  is  a  subset  of  the  requests  m  (Cj)  — > j),  then  delete<j(£^.  —  j) 
is  also  consistent  with  (1 Z,-<n). 
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Proof:  Let  (C,—*c)  =  delete<j(£^,  — >  j).  We  first  show  that  (C,—*c)  only 

contains  requests  on  objects  served  by  /.  Because  (£,  — *£)  is  formed  by  deleting 
requests  from  we  know  that  £  C  Cj.  By  premise,  (C{,  —>f)  is  consistent 

and  so  all  of  these  requests  are  on  objects  served  by  /. 

We  now  show  that,  for  any  request  in  (£,—♦£),  all  of  its  dependents  (on 
objects  served  by  /)  are  also  in  (£,—♦£).  The  proof  is  by  contradiction.  Let 
x.A  €  £  be  any  request  in  the  transformed  log,  and  let  y.B  €  7Z  be  any  of  its 
dependents  ( y.B  -<■%  x.A )  on  an  object  served  by  /  (B  €  OBJSf).  Suppose 
that  y.B  is  not  in  (£,—►£).  Because  C  C  Cj  we  know  that  x.A  6  £^.  Because 
( )  is  consistent  by  premise,  it  follows  that  y.B  €  Cj.  Request  y.B  must 
therefore  have  been  removed  from  the  log  in  some  round,  R,  of  the  iterative 
deletion  algorithm.  However,  by  definition  of  the  algorithm,  request  x.A  would 
then  have  been  removed  from  the  log  in  round  R  +  1  of  the  iteration  because  it 
depends  on  request  y.B,  contradicting  the  fact  that  x.A  6  £■  The  transformed 
log,  (£,  — >c),  must  therefore  contain  y.B. 

The  last  thing  we  must  show  is  that  the  order  of  requests  in  (£,  — *c)  is  con¬ 
sistent  with  (R,  -<n).  However,  by  definition,  the  order  of  requests  in  (£,—►£)  is 
consistent  with  the  order  of  requests  in  which  is  by  premise  consistent 

with  (71,  -<*).  □ 

4.4  Summary 

This  chapter  presented  several  transformations  for  altering  the  log  of  a  server 
while  preserving  its  consistency.  The  chapter  began  by  presenting  transforma¬ 
tions  for  adding  and  deleting  requests  from  a  log.  These  transformations  were 
based  on  having  explicit  knowledge  of  the  dependencies  between  client  requests. 
It  was  then  shown  how  these  transformations  can  be  modified  to  use  estimates 
of  the  request  dependencies  when  exact  information  is  not  available.  A  key  to 
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the  correctness  of  these  new  transformations  was  the  use  of  approximations  that 
never  under-estimated  the  true  set  of  dependencies.  By  using  sound  estimates, 
the  transformations  were  assured  of  enforcing  all  true  dependencies,  in  addition 
to  a  few  extraneous  ones. 


Chapter  5 


Recovery  Solutions 


In  this  chapter  we  present  algorithms  for  solving  the  JOIN  and  ACTIVATE 
problems.  These  algorithms  are  based  on  the  log  transformations  of  chapter  4. 
We  begin  by  assuming  that  explicit  dependency  information  is  not  available  in 
the  system  and  so  the  only  transformations  available  to  the  recovery  algorithms 
are  those  based  on  dependency  estimates.  We  then  show  how  these  recovery 
algorithms  can  be  simplified  when  the  transformations  using  explicit  dependency 
information  are  available. 

5.1  JOIN  Solution 

When  a  server  first  recovers  from  a  failure  it  restores  its  replicas  of  active  objects 
to  those  objects’  current  states.  The  server  alters  its  log  to  reflect  the  current 
object  states  and  then  replays  the  log  to  restore  its  replicas. 

A  recovering  server’s  log  may  be  out  of  date  with  respect  to  the  current  states 
of  active  objects  in  two  ways.  First,  the  log  may  not  reflect  all  of  the  requests 
present  in  those  objects’  current  states.  Such  requests  are  generally  those  that 
the  server  did  not  received  while  it  was  failed.  We  let  MS$/f  denote  the  set  of 
requests  on  active  objects  missing  from  the  log  of  a  recovering  server.  /.  in  state 
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5. 

MSs/f  =  U  [  ASs/a  -  (£s/f'~*s/f)  U  ] 

{  AeOBJSf  |  ACTs/a &  } 

Second,  a  recovering  server’s  log  may  be  out  of  date  because  it  reflects  requests  on 
active  objects  that  are  not  present  in  those  objects’  current  states.  Such  requests 
are  generally  those  that  the  active  servers  failed  to  recover  after  some  previous 
failure  event.  We  let  denote  the  set  of  requests  on  active  objects  present 

in  the  log  of  server  /,  in  state  S,  that  axe  not  present  in  their  objects’  active 
states. 


Ms,/  -  U  [  (^S//*  ~*S/f )  U  ~  ASs/a  ] 

{  AeOBJSj  j  ACT  si  A**  } 

In  order  to  restore  correct  object  replicas,  a  recovering  server  must  remove  the 
requests  in  Af'R-s/ /  from  its  log  and  add  those  in  MSs/f .  The  complete  algorithm 
for  solving  the  JOIN  problem  for  server  /  in  state  5  is  shown  in  figure  5.1.  In 
the  algorithm,  T  is  the  state  constructed  to  solve  the  problem. 

Note  that  in  step  JSl  the  new  log  is  tested  to  make  sure  that  the  addition  and 
deletion  of  requests  yielded  the  correct  logged  state.  The  reason  for  this  is  that 
the  transformations  may  inadvertently  attempt  to  add  or  delete  a  request  from 
the  active  state  logged  for  an  object.  Because  dependency  estimates  are  used,  the 
log  transformations  may  occasionally  incorrectly  believe  that  a  dependency  holds 
between  two  requests,  one  of  which  is  in  its  object’s  active  state  and  the  other 
of  which  is  not.  When  this  happens,  the  transformations  may  incorrectly  add 
or  delete  requests  from  the  logged  state  of  an  active  object  in  order  to  preserve 
the  log’s  consistency.  When  this  situation  occurs,  the  recovery  algorithm  must 
abort  and  wait  until  better  estimates  of  the  dependencies  can  be  formed.  The 
technique  of  recovery  logs  [Gra78j  (do  not  confuse  this  with  the  term  ‘“log”  used 
in  this  dissertation)  can  be  used  to  record  and  undo  any  changes  to  a  server  s  log 
resulting  from  an  aborted  recovery  attempt. 

The  JOIN  recovery  algorithm  is  formally  proved  correct  below: 
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JS1.  i^T/f'  ~  add  A-^s/r  ( delet  e^^s/f  (  £5//  ’  ~*5// ) ) 

if  3A€OBJSf(ACTs/A #  0)  (^r//*- 1 ’r//)  I  a  ^  ^Ss/zt 

then  abort 

JS2.  (£T/f>-,T/#)  =  (t-s/i'~'s/ j)  Vj  6  SfKV  -  {/} 

JS3.  a;ttm  =  -4crs/x  u  {/} 

«CW  =  -  {/} 

^CrT/A  ,4CT5/>1  V  A  6  00  JS  (A  £  013  JS f  v  <4CTsM  =  0) 
KSCt/a  =  KSCs/a 

JS4.  TAZCjfA  =  TAICs/a  V  .4  €  00J5 


Figure  5.1:  Solution  to  the  JOIN  problem  for  server  /  in  state  5 
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Theorem  5.1 

If  S  is  a  state  consistent  with  request  structure  (7 Z,  •<■%),  and  if  f  is  a  server 
recovering  in  state  S,  then  state  T  as  constructed  above  correctly  solves  the 
JOIN  problem  for  server  f  in  state  S  under  request  structure  (71,  -<7 j). 

Proof:  We  must  show  that  the  five  conditions  (JC1-JC5)  of  the  JOIN  problem 

are  satisfied  by  state  T. 

The  first  condition,  JC1  (the  consistency  of  fol¬ 

lows  immediately  from  the  fact  that  (£5^,  ~*s/f )  *s  consistent  with  (71,  <ti  )  (by 
premise)  and  that  both  log  transformations  preserve  consistency  (theorems  4.3 
and  4.4). 

The  second  condition,  JC2  (the  consistency  of  (Cj-/f'  *be  current 

states  of  active  objects),  follows  immediately  from  the  test  in  step  JSl  of  the 
JOIN  solution. 

Conditions  JC3,  JC4,  and  JC5  follow  directly  from  steps  JS2,  JS3,  and  JS4 
of  the  JOIN  solution,  respectively.  □ 

5.2  ACTIVATE  Solution 

Once  a  server  completes  its  JOIN  phase,  it  begins  recovering  its  replicas  of  in¬ 
active  objects.  All  recovering  servers  of  an  inactive  object  participate  in  the 
object’s  recovery.  The  recovering  servers  start  by  merging  their  logs  to  form  the 
most  up-to-date  state  possible  for  the  object.  We  let  1S$/a  denote  this  ideal 
state  for  inactive  object  A  in  state  5. 

Z&S/A  =  U  (^5//>-*5//)  U 

f€Tl£CS/A 

The  ideal  state  may  be  inconsistent  with  the  states  of  some  active  objects  in 
the  system,  however.  There  may  be  requests  in  the  ideal  state  that  have  depen¬ 
dencies  on  requests  that  are  not  reflected  in  their  objects’  active  states.  These 
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inconsistent  requests  should  be  omitted  from  the  new  state  of  object  A  so  that 
the  overall  state  of  the  system  remains  consistent.  We  let  SAJr£s(x.A)  denote 
the  predicate  that  all  of  the  dependents  of  request  i.A,  on  objects  that  are  active 
in  state  5,  are  present  in  their  objects’  active  states. 

SAF£s(x.A)  =  VSVb(x.A)  C  ASS/B 

{  BeOBjS  |  ACT  si  } 

Because  we  are  assuming  that  explicit  dependency  information  is  not  available 
in  the  system,  the  exact  value  of  SAJr£s(x.A)  is  not  available  to  the  recovery 
mechanism.  Instead,  we  assume  that  the  recovery  mechanism  has  available  to  it 
an  estimate,  $Af£s(x.A),  of  the  safety  predicate.  This  estimate,  like  the  other 
estimates,  has  the  property  that  it  is  sound. 

JIFZs(xA)  =>  SA7£s{x.a) 

The  state  recovered  by  the  servers  of  object  A  will  then  consist  of  the  requests 
in  the  ideal  state,  1S$/a,  that  are  estimated  to  be  safe.  We  let  A rSS/A  denote 
this  state. 

■MSs/a  —  {  X’A  G  ISs/A  |  SAf£s(x-a)  } 

Each  recovering  server  installs  the  new  state  for  object  A  into  its  log  the  same 
way  it  installed  the  active  states  of  objects  during  its  JOIN  phase.  First,  the 
server  deletes  from  its  log  any  request  on  object  A  that  is  not  part  of  the  new 
state.  We  let  AfUg/^A)  denote  the  set  of  requests  removed  from  the  log  of  server 

/  e  nscg/A- 

MUs/fiA)  =  (^s//’"*s//)  U  "  MSs/A 

The  server  then  adds  to  its  log  any  request  in  the  new  state  that  is  not  already 
logged.  We  let  MSB/f(A)  denote  the  set  of  requests  added  to  the  log  of  server 

/  6  IIZCs/a- 


M$S/f(A)  =  MSs/a  ~  (Cs/f’-*s/f)  \a 
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ASl.  (CT//,->T/f)  -  addA<5s/f(A)  (delete^s/f(A)  (£s//,-^s//)) 

V  /  €  K£Cs/a 

if  3  /  €  H£Cs/a  (^t/ /)  Ia  ^  •'^’^s/a 
then  abort 

if  3  /  €  K£Cs/a  and  3  B  €  OBJSf  (/  €  ACT S/B) 

s.t.  (£T/f'~*T/f)\B  ^  ~ "S//)  \b 

then  abort 


if  3  B  €  OBJS  ( ACTs/b  ^  0)  and  3  y.B  €  ASs/B 
s.t.  WpA(y.B)<£AfSs/A 


then  abort 

AS2. 

(CT/g’-+T/g)  =  (CS/g'~^S/g^ 

V  g  6  5£7£V  -  7*£C 

AS3. 

ACTT/a  =  K£CS/a 

H£Ct/a  =  0 

ACTT/b  =  ACTs/b 

s/8 

VS  €  OBJS  -  {A} 

AS4. 

FA2CtIA  =  ^£s/m 

V  A  €  OBJS 

Figure  5.2:  Solution  to  the  ACTIVATE  problem  for  object  A  in 
state  5 
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The  complete  algorithm  for  solving  the  ACTIVATE  problem  for  object  A  in  state 
5  is  shown  in  figure  5.2.  Again,  T  is  the  state  constructed  to  solve  the  problem. 

Note  that  the  new  logs  of  the  recovering  servers  are  tested  in  step  ASl  to  make 
sure  that  the  logged  states  of  active  objects  are  not  corrupted.  As  with  the  JOIN 
algorithm,  the  use  of  dependency  estimates  can  cause  the  log  transformations  to 
inadvertently  add  or  delete  requests  from  the  logged  state  of  an  active  object. 
When  this  occurs,  the  ACTIVATE  algorithm  must  abort  and  wait  until  better 
dependency  estimates  can  be  formed  before  trying  to  ACTIVATE  object  A. 

The  ACTIVATE  algorithm  is  formally  proved  correct  below: 

Theorem  5.2 

If  S  is  a  state  consistent  with  request  structure  and  if  A  €  OBJS 

is  an  inactive  object  in  state  S,  then  state  T  as  constructed  above  correctly 
solves  the  ACTIVATE  problem  for  object  A  in  state  S  under  request  structure 

{R,  -<n)- 

Proof:  We  must  show  that  the  seven  conditions  (AC1-AC7)  of  the  ACTIVATE 

problem  axe  satisfied  by  state  T. 

The  first  condition,  ACl  (the  consistency  of  the  recovering  servers'  new  logs 
with  (' R ,  -<71)),  follows  immediately  from  the  fact  that  the  logs  were  consistent 
with  (Tl,  -<p)  in  state  5  (by  premise)  and  that  both  log  transformations  preserve 
consistency  (theorems  4.3  and  4.4). 

The  property  that  all  recovering  servers  of  object  A  agree  on  the  new  state  for 
A  (condition  AC2)  follows  directly  from  the  first  test  in  step  ASl;  if  the  algorithm 
does  not  abort,  the  logs  of  all  recovering  servers  of  A  will  reflect  AfS$/A- 

Condition  AC3  asserts  that  the  new  state  for  A  is  consistent  with  the  states 
of  all  other  active  objects  in  the  system.  We  show  that  this  condition  holds  in 
state  T  in  two  parts.  First,  we  show  that  there  are  no  requests  in  the  new  state 
of  A  that  have  dependencies  on  requests  that  are  not  part  of  their  objects’  active 
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states. 

Vx.A  €  AST/a  :  'iy.B  €  ft  ( y.B  -<n  x.A)  :  ACTt/b  jL  0  =*>  y.B  €  ASj/b 

This  portion  of  the  condition  follows  directly  from  the  definition  of  safety  and  the 
fact  that  only  safe  requests  are  included  in  the  new  state  of  object  A.  Note  that 
by  definition  of  the  ACTIVATE  solution,  the  states  of  all  active  objects  other 
than  A  do  not  change  between  states  S  and  T. 

The  second  part  of  the  proof  of  condition  AC3  involves  showing  that  all  object 
A  dependents,  of  requests  reflected  in  the  state  of  another  active  object,  B.  are 
present  in  the  new  state  of  A. 

Vy.B  €  AST/g  ( ACTt/b  £  ®)  :  Vx-A  €  TZ  ( x.A  -<n  y.B )  :  x.A  £  ASt/A 

This  part  follows  immediately  from  the  third  test  in  step  ASl. 

Condition  AC4  follows  immediately  from  the  second  test  in  step  ASl  of  the 
algorithm.  Conditions  AC5,  AC6,  and  AC7  follow  immediately  from  steps  AS2. 
AS3,  and  AS4  of  the  algorithm,  respectively.  □ 

5.3  Using  Explicit  Dependency  Information 

The  preceding  recovery  algorithms  assume  that  explicit  dependency  information 
is  not  available  in  the  system.  Both  algorithms  use  estimates  of  the  dependencies 
between  requests  to  ensure  that  a  recovering  server  restores  consistent  states  to 
its  object  replicas.  However,  the  use  of  inaccurate  estimates  sometimes  cause 
the  log  transformations  used  by  the  algorithms  to  corrupt  the  logged  states  of 
active  objects.  The  algorithms  must  therefore  test  for  this  condition  and  abort 
if  it  occurs. 

In  this  section,  we  examine  how  the  recovery  algorithms  are  simplified  when 
exact  dependency  information  w  available  in  the  system.  When  such  informa¬ 
tion  is  present,  the  algorithms  can  substitute  the  log  transformations  based  on 


estimates  with  those  based  on  exact  dependency  values.  These  precise  transfor¬ 
mations  have  the  advantage  that  they  do  not  corrupt  the  logged  states  of  active 
objects.  As  a  result,  most  of  the  tests  in  steps  JSl  and  ASl  of  the  recovery 
algorithms  can  be  omitted. 

5.3.1  JOIN  Simplification 

We  begin  by  showing  that  the  states  of  active  objects  logged  in  step  JSl  of 
the  JOIN  algorithm  are  never  corrupted  when  the  log  transformations  based  on 
explicit  dependency  information  are  used.  We  do  this  in  two  lemmas.  The  first 
lemma  shows  that  the  deletion  transformation  never  removes  from  the  log  any 
request  in  the  active  state  of  an  object.  The  second  lemma  proves  that  the 
addition  transformation  never  adds  to  the  log  a  request  on  an  active  object  that 
is  not  in  that  object’s  active  state.  It  follows  from  these  two  lemmas  that  the 
test  in  step  JSl  of  the  JOIN  solution  can  be  omitted  when  exact  dependency 
information  is  available  in  the  system. 

Lemma  5.1 

When  explicit  dependent  information  i3  available,  the  deletion  transformation 
in  step  JSl  of  the  JOIN  recovery  algorithm  never  causes  the  algorithm  to  abort. 

Proof:  We  must  show  that  the  deletion  transformation  never  removes  from  a 

server’s  log  any  request  that  is  in  the  active  state  of  an  object.  The  proof  is  by 
contradiction. 

Let  /  €  SETIV  be  a  server  recovering  in  some  observably  consistent  state. 
5,  of  the  system.  Suppose  that  during  the  JOIN  phase  of  server  /  the  deletion 
transformation,  delete//"^^,  removes  from  the  log  of  server  /  some  request,  x.A. 
that  is  in  the  active  state  of  object  A. 

I./4  6  ASS/A 

By  definition  of  Af'Rs/fi  we  ^now  that  x.A  &  MlZg/f  because  x.A  is  in  the 
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active  state  of  A.  Request  x.A  must  therefore  have  been  removed  from  the  log 
because  it  depends  on  some  request,  y.B,  in  M7ZS/f- 

y.B  -<n  x.A 

However,  in  order  for  request  y.B  to  be  a  member  of  M'R.s/ /«  it  must  be  the  case 
that  object  B  is  active  in  state  S  and  that  y.B  is  not  in  the  active  state  of  B . 

y.B  £  ASs/b 

State  S  therefore  reflects  a  request,  x.A,  in  the  active  state  of  an  object,  .4. 
without  reflecting  one  its  dependents,  y.B,  on  another  active  object,  B. 

ACTs/b  *  0  ACT  si  A  *  0 

y.B  £  ASs/b  x.a  E  ASs/a 

y.B  -<n  x.A 

State  5  is  therefore  observably  inconsistent,  a  contradiction.  The  deletion  trans¬ 
formation  must  then  have  preserved  the  active  states  logged  for  active  objects.  - 

Lemma  5.2 

When  explicit  dependent  information  is  available,  the  addition  transformation 
in  step  JSl  of  the  JOIN  recovery  algorithm  never  causes  the  algorithm  to  abort. 

Proof:  We  must  show  that  the  addition  transformation  never  adds  to  a 

server’s  log  any  request  that  is  not  in  the  active  state  of  an  object.  The  proof  is 
by  contradiction. 

Let  /  €  5£ftV  be  a  server  recovering  in  some  observably  consistent  state. 
5,  of  the  system.  Suppose  that  during  the  JOIN  phase  of  server  /  the  addition 
transformation,  add,v<ss/f,  adds  to  the  log  of  server  /  some  request,  x.A  6  ft. 
that  is  not  in  the  active  state  of  object  A. 

x.A  ^  ASS/a 
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By  definition  of  AAS$jf,  we  know  that  x.A  £  AAS^/j  because  x.A  is  not  in 
the  active  state  of  A.  Request  x.A  must  therefore  have  been  added  to  the  log 
because  it  is  a  dependent  of  some  request,  y.B ,  in  MS$/f. 

x.A  -<£  y.B 

However,  in  order  for  request  y.B  to  be  a  member  of  MS$/ /,  it  must  be  the  case 
that  object  B  is  active  in  state  5  and  that  y.B  is  in  the  active  state  of  B . 

y.B  €  AS$/b 

State  5  therefore  reflects  a  reque®*,  y.B ,  in  the  active  state  of  an  object,  B . 
without  reflecting  one  its  dependents,  x.A ,  on  another  active  object,  .4. 

ACT s/ a  ^  ®  ACT s/b  ^  0 

x.A  &  ASS/a  y.B  6  ASs/b 

X.A  <TI  y.B 

State  5  is  therefore  observably  inconsistent,  a  contradiction.  The  addition  trans¬ 
formation  must  then  have  preserved  the  active  states  logged  for  active  objects.  □ 


5.3.2  ACTIVATE  Simplification 

We  now  show  that  the  log  transformations  in  step  AS1  of  the  ACTIVATE  algo¬ 
rithm  do  not  corrupt  the  logged  states  of  active  objects  when  exact  dependency 
information  is  available.  Because  exact  dependency  information  is  available,  we 
assume  that  the  new  state,  for  the  object  being  activated  is  constructed 

using  the  true  definition  of  safety  and  not  an  estimate. 

Activated  Ob?'  :t 

We  begin  by  showing  that  the  transformations  always  correctly  install,  at  the 
recovering  servers,  the  new  state  of  the  object  begin  activated.  This  is  done 
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in  two  lemmas  analogous  to  those  in  the  preceding  sub-section.  It  follows  from 
these  lemmas  that  the  first  test  in  step  ASl  of  the  ACTIVATE  algorithm  can  be 
omitted  when  exact  dependency  information  is  available. 

Lemma  5.3 

When  explicit  dependency  information  is  available,  the  deletion  transformation 
in  step  ASl  of  the  ACTIVATE  recovery  algorithm  never  corrupts  the  new  state 
logged  for  the  object  being  activated. 

Proof:  We  must  show  that  the  deletion  transformation  never  removes  from  a 

recovering  server’s  log  any  request  that  is  in  the  new  state  for  the  object  being 
activated.  The  proof  is  by  contradiction. 

Let  5  be  an  observably  consistent  state  in  which  some  object,  A  €  It,  is 
being  activated.  Suppose  that  during  the  ACTIVATE  phase  at  some  server,  / 
(/  6  7t£C s/a)i  deletion  transformation  delete^  (Aj  removes  from  the  log 

of  server  /  some  request,  x.A ,  that  is  in  the  new  state  for  object  .4. 

x. A  €  M S s/A 

Because  x.A  is  in  MSs/ai  it  cannot  be  in  Afltg/ f(A).  Request  x.A  must 
therefore  have  been  removed  from  the  log  because  it  depends  on  some  request, 
y-A,  in  Ml lS/f(A). 

y.A  -<-z  x.A 

Further,  because  request  y.A  is  in  MTlS/f(A),  it  cannot  be  in  M S S/A. 

y. A  &  M Ss/a 

Now,  request  y.A  must  be  in  TSs/a  because  it  is  in  (£$//'—* ’5//)  (4ke 
a  recovering  server  of  object  A).  To  see  that  y.A  is  in  (^s/f'~*S/f^ '  note 
request  x.A  is  in  {^s/ ^d  so’  ^  definition  of  consistency,  the  log  must 
also  contain  all  of  the  object  A  dependents  of  x.A,  including  y.A. 
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Because  y.A  is  in  TS$jA  but  not  in  MSg/Ai  it  must  be  unsafe  (by  definition 
of  MSS/a)-  Because  y.A  is  a  dependent  of  x.A,  request  x.A  must  also  be  unsafe. 
However,  x.A  is  included  in  SfS$/A,  contradicting  the  fact  that  .\rS$/A  contains 
only  safe  requests. 

The  deletion  transformation  must  therefore  have  preserved  the  new  logged 
state  for  object  A.  □ 

Lemma  5.4 

When  explicit  dependency  information  is  available ,  the  addition  transforma¬ 
tion  in  step  ASl  of  the  ACTIVATE  recovery  algorithm  never  corrupts  the  new 
state  logged  for  the  object  being  activated. 

Proof:  We  must  show  that  the  addition  transformation  never  adds  to  a 

recovering  server’s  log  any  request,  on  the  object  being  activated,  that  is  not  in 
that  object’s  new  state.  The  proof  is  by  contradiction. 

Let  5  be  an  observably  consistent  state  in  which  some  object,  .4  G  H,  is 
being  activated.  Suppose  that  during  the  ACTIVATE  phase  at  some  server.  / 
(/  G  7 Z£Cs/a)i  the  addition  transformation  add;n$s/f(A)  adds  to  the  log  of  server 
/  some  request,  x.A ,  that  is  not  in  the  new  state  ( Aj'S$/a )  f°r  object  .4. 

X.A  &  M"SS/a 

Because  x.A  is  not  in  AfS$/Ai  it  cannot  be  in  MS s/f{A).  Request  x.A  must 
therefore  have  been  added  to  the  log  because  it  is  a  dependent  of  some  request, 
y-A ,  in  MSS//(A). 

x.A  -<n  y.A 

Further,  because  request  y.A  is  in  MS $/ /(A),  it  must  also  be  in  MS^/a¬ 


y.A  G  tfSs/A 
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We  now  show  that  request  x.A  is  unsafe.  To  see  this,  first  note  that  request 
x.A  must  be  in  the  log  of  some  recovering  server  of  object  .4.  This  follows  from 
the  fact  that  y.A  is  in  the  log  of  some  recovering  server,  g  €  7l£CS/A,  of  object 
.4  (because  y.A  is  in  A fSs/A  aad  therefore  also  in  1SS/A,  which  is  formed  by 
merging  the  logs  of  the  recovering  object  A  servers)  and  from  the  fact  that  the 
log  of  server  g  is  consistent,  and  so  must  contain  all  of  the  object  A  dependents 
of  y.A ,  including  x.A. 

Now,  becaus-  x.A  is  in  (C s^g,  — *s/a)  (the  l°g  of  a  recovering  server  of  .4),  it 
must  be  in  X S§/  However,  x.A  was  omitted  from  A fS s/a-  The  only  reason  this 
could  happen  is  because  x.A  is  unsafe. 

Because  request  x.A  is  unsafe,  and  request  y.A  depends  on  x.A,  request  y.A 
must  also  be  unsafe.  However,  y.A  is  included  in  MSs/ai  contradicting  the  fact 
that  ATS  s/a  °nly  contains  safe  requests. 

The  addition  transformation  must  therefore  have  preserved  the  new  logged 
state  for  object  A.  □ 

Other  Active  Objects 

We  now  show  that  the  logged  states  of  other  active  objects  at  the  recovering 
servers  are  not  corrupted  by  the  log  transformations.  Again,  we  do  this  in 
two  lemmas.  It  follows  from  these  lemmas  that  the  second  test  in  step  AS1 
of  the  ACTIVATE  algorithm  is  unnecessary  when  exact  dependency  information 
is  available. 

Lemma  5.5 

When  explicit  dependency  information  is  available,  the  deletion  transformation 
in  step  ASl  of  the  ACTIVATE  recovery  algorithm  never  corrupts  the  logged 
state  of  any  previously  active  object. 
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Proof:  Let  5  be  an  observably  consistent  state  in  which  some  object.  A  £  R. 

is  being  activated.  And,  let  B  denote  any  other  active  object  in  state  5.  We 
must  show  that  for  any  recovering  server,  /,  of  object  A,  if  /  is  an  active  server 
of  B  (/  €  V.£C s/a  f)  ACT s/b  )  then  the  deletion  transformation  does  not  remove 
from  f's  log  any  request  on  object  B. 

The  proof  is  by  contradiction.  Suppose  that  the  deletion  transformation 
deletev£s/((A)  removes  from  the  log  of  server  /  some  request,  y.B,  on  object 
B.  We  show  that  state  5  would  then  be  observably  inconsistent. 

Because  5  is  observably  consistent,  all  active  servers  of  B  is  state  5,  including 
/,  reflect  the  active  state  of  B.  Because  y.B  is  reflected  in  the  log  of  /,  it  follows 
that  y.B  is  part  of  the  active  state  of  B. 


y.B  6  ASs/b 


In  order  for  the  deletion  transformation  to  remove  request  y.B  from  the  log  of 
server  /,  y.B  must  be  dependent  on  some  object  A  request,  x.A,  that  is  removed 
from  the  log. 

x.A  6  AfTls/fiA)  (51) 

x.A  y.B 

Because  i.A  is  in  NV.s/f{A),  it  cannot  be  part  of  the  new  state  of  object  A. 

x.A  g  NSs/a 


Because  x.A  is  in  the  log  of  a  recovering  server  of  object  A,  but  not  included  in 
the  new  state  of  that  object,  request  x.A  must  be  unsafe.  That  is,  request  x.A  is 
dependent  on  some  other  request  (for  an  ictive  object),  z.C,  that  is  not  part  of 
that  object’s  active  state. 

z.C  -<v  x.A 

*  (5.2) 

z.C  $  ASS/c 

By  transitivity  (from  5.1  and  5.2),  request  y.B  is  dependent  on  request  z.C. 
The  state  of  object  B  (an  active  object)  therefore  reflects  a  request,  y.B.  that 
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is  dependent  on  a  request,  z.C ,  not  reflected  in  the  state  of  object  C  (another 
active  object). 

z.C  -<£  y.B 

z.c  £  ASs/c  y-B  6  ASs/b 

ACT s/c  *  0  ACT s/b  *  0 

This  contradicts  the  original  assumption  that  state  S  is  observably  consis¬ 
tent.  The  deletion  transformation  could  not  therefore  have  removed  any  object 
B  request  from  the  log  of  server  /.  □ 

Lemma  5.6 

When  explicit  dependency  information  is  available,  the  addition  transforma¬ 
tion  in  step  ASl  of  the  ACTIVATE  recovery  algorithm  never  corrupts  the 
logged  state  of  any  previously  active  object. 

Proof:  Let  S  be  an  observably  consistent  state  in  which  some  object,  A  6  1Z. 

is  being  activated.  And,  let  B  denote  any  other  active  object  in  state  5.  We 
must  show  that  for  any  recovering  server,  /,  of  object  A,  if  /  is  an  active  server 
of  B  (f  €  TISCs/aOACT s/b )  then  the  addition  transformation  does  not  add 
any  object  B  request  to  the  log  of  server  /. 

The  proof  is  by  contradiction.  Suppose  that  the  addition  transformation 
add^5s/f^)  adds  to  the  log  of  server  /  some  request,  y.B ,  on  object  B.  We 
show  that  the  new  state  for  object  A  contains  an  unsafe  request. 

Because  5  is  observably  consistent,  all  active  servers  of  B  in  state  5,  including 
/,  reflect  the  active  state  of  B.  Because  y.B  is  added  to  the  log  of  /  (and  so  was 
not  originally  present  in  the  log),  it  follows  that  y.B  is  not  part  of  the  active  state 
of  B. 


y.B  &  AS$/b 
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Request  y.B  can  only  have  been  added  to  the  log  by  the  addition  transforma¬ 
tion  if  y.B  is  a  dependent  of  some  object  A  request,  x.A,  that  was  also  added  to 
the  log. 

x.A  6  MS $/ /(A) 

x. A  -<%  y.B 

Because  x.A  is  in  MS $/ /(A),  it  is  part  of  the  new  state  of  object  .4. 

x.A  6  Af S$/A 

The  new  state  for  object  A  (A^Ss/a)  therefore  reflects  a  request,  x.A,  that  is 
dependent  on  an  object  B  request,  y.B ,  that  is  not  reflected  in  the  active  state 
of  B  (an  active  object). 

y. B  -<n  x.A 

y.B  &  AS  sib  xA  €  Af  Ss/A 

ACT s/b  *  0  ACT s, a  =  0 

That  is,  the  new  state  for  object  A  reflects  an  unsafe  request,  x.A,  contradicting 
the  fact  that  AfS$/A  only  contains  safe  requests.  The  addition  transformation 
could  not  therefore  have  added  any  object  B  request  to  the  log  of  server  /.  □ 


5.4  Summary 

Based  on  the  log  transformations  of  chapter  4,  we  detailed  algorithms  for  solving 
the  JOIN  and  ACTIVATE  recovery  problems.  We  began  by  describing  algo¬ 
rithms  for  solving  the  problems  when  exact  dependency  information  is  not  avail¬ 
able.  These  algorithms  used  dependency  estimates  to  derive  consistent  object 
and  replica  states  when  a  server  recovered  from  a  failure.  It  was  proved  that 
these  algorithms  preserve  observable  consistency  in  a  system. 

Because  only  estimates  of  the  true  request  dependencies  were  used,  these 
algorithms  could  inadvertently  corrupt  the  logged  states  of  objects.  The  algo¬ 
rithms  therefore  had  to  test  for  corrupted  states  and  abort  if  such  states  occurred. 
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However,  it  was  shown  that  when  exact  dependency  information  is  available  to 
the  algorithms,  no  corruption  of  logged  states  occurs.  !  lost  of  the  tests  in  the 
recovery  algorithms  could  then  be  omitted  when  such  information  is  available. 


Chapter  6 

Estimating  Dependencies 


When  explicit  dependency  information  is  not  available  in  a  system,  the  recovery 
algorithms  of  chapter  5,  as  well  as  the  log  transformations  on  which  they  de¬ 
pend,  can  use  estimates  of  the  dependencies  between  requests.  However,  in  order 
to  guarantee  that  consistency  is  preserved  in  a  system,  the  algorithms  require 
that  the  estimates  used  are  always  sound.  In  this  chapter  we  present  several 
dependency  estimates  having  this  property. 

The  estimates  are  divided  into  two  classes:  basic  and  compound.  Basic  esti¬ 
mates  are  simple  estimates  designed  to  approximate  the  set  of  direct  dependencies 
between  requests. 

Definition  6.1 

A  dependency  between  two  requests,  x.A  -<n  y.B,  under  a  request  structure 
(7£, is  said  to  be  direct  if  there  is  no  intervening  request,  z.C,  through 
which  x.A  and  y.B  are  related.  Formally, 

£  z.C  ZK  {z.C  £  x.A  /\z.C  £  y.B)  '.  x.A  z.C  /\  z.C  -<n  y.B 

The  basic  estimates  are  formed  by  examining  individual  logs  for  evidence  of 
request  orderings.  Compound  estimates  are  more  complicated  estimates  designed 
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to  approximate  the  set  of  transitive  dependencies  between  requests. 

Definition  6.2 

A  dependency  between  two  requests,  x.A  -<■%  y.B,  under  a  request  structure 
(1Z,  -<n),  is  said  to  be  transitive  if  it  is  not  direct. 

The  compound  estimates  are  formed  by  combining  the  results  of  the  basic  esti¬ 
mates  in  order  to  derive  indirect  (transitive)  dependencies  between  requests. 

6.1  Potential  Dependencies 

Although  we  do  not  assume  that  the  recovery  mechanism  is  given  any  explicit 
information  about  the  dependencies  between  requests,  we  do  assume  that  it  is 
given  some  general  information  about  potential  dependencies  between  objects. 
In  particular,  we  assume  that  the  recovery  mechanism  has  access  to  a  potential 
dependency  relation. 

Definition  6.3 

A  potential  dependency  relation,  over  request  structure  is  a 

binary  relation  on  the  objects  in  OBJS  with  the  property  that  it  relates  all 
pairs  of  objects  between  which  direct  dependencies  hold. 

V  x.A, y.B  €  'll :  direct  x.A  ■<%  y.B  =>  A^tiB 

A  potential  dependency  relation  is  only  an  approximation  of  the  direct  depen¬ 
dencies  that  may  hold  between  the  states  of  objects.  A  potential  dependency 
relation  may  relate  objects  between  which  dependencies  do  not  hold. 

A  B  3  x.A,  y.B  €  H  ■  x.A  -<tl  y.B 

The  accuracy  with  which  a  potential  dependency  relation  reflects  the  actual  de¬ 
pendencies  between  objects  is  determined  by  the  application’s  programmer,  who 
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is  responsible  for  providing  the  recovery  mechanism  with  the  potential  depen¬ 
dency  relation  it  uses.  The  programmer  should  provide  the  recovery  mechanism 
with  the  best  potential  dependency  relation  that  they  can  construct,  based  on 
their  knowledge  of  the  application’s  semantics.  In  the  worst  case,  the  program¬ 
mer  will  be  unable  to  determine  which  objects  will  be  related  and  so  produces  a 
potential  dependency  relation  in  which  all  objects  are  potentially  related.  We  will 
use  the  notation  to  refer  to  the  transitive  closure  of  a  potential  dependency 
relation 

In  order  to  help  ensure  that  each  direct  dependency  in  an  application  is  rep¬ 
resented  in  the  order  of  requests  within  some  log  of  the  system,  the  server  sets 
of  potentially  related  objects  are  restricted  so  that  they  overlap. 

Overlap  Restriction 

V  A,  Be  OBJS  :  A  B  =>  $SHVA  C\S£71VB  ±  0 

There  is  therefore  a  tradeoff  between  the  accuracy  of  a  potential  dependency 
relation  and  the  structural  restrictions  placed  on  the  server  sets:  any  extraneous 
dependency  reflected  in  the  potential  dependency  relation  forces  the  server  sets 
of  the  objects  involved  to  unnecessarily  overlap.  In  order  to  maximize  the  flexi¬ 
bility  of  the  system  structure,  it  is  important  that  the  application’s  programmer 
provides  the  most  accurate  potential  dependency  relation  possible. 

As  an  example,  consider  a  system  containing  three  objects:  A,  B .  and  C. 
Suppose  that  an  application  runs  under  the  following  request  structure: 

Request  Structure:  (71, 

H  =  { x.A ,  y.B ,  z.C) 
x.A  -<*  y.B 

Figure  6.1  depicts  three  potential  dependency  relations  that  are  consistent  with 
this  request  structure.  Only  potential  dependency  relation  (c)  accurately  reflects 
the  request  structure  of  the  application. 
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(a)  (b)  (c) 

Figure  8.1:  Three  consistent  potential  dependency  relations 


6.2  Basic  Estimates 

Because  the  orders  of  requests  in  servers’  logs  are  consistent  with  the  request 
structure  of  an  application,  these  orders  can  provide  information  about  the  de¬ 
pendencies  between  requests.  The  basic  dependency  estimates  axe  designed  to 
search  servers’  logs  for  such  information.  We  begin  this  section  by  detailing  an 
estimate  for  determining  when  two  requests  are  not  dependent.  This  estimate  is 
then  used  to  construct  another  estimate  for  determining  a  request’s  set  of  causal 
dependents. 

We  assume  that  when  a  server  fails,  all  information  located  at  that  server  be¬ 
comes  inaccessible  to  the  rest  of  the  system.  As  a  result,  the  recovery  mechanism 
can  only  use  information  present  in  the  logs  of  functioning  servers  (non-failed 
servers)  when  constructing  dependency  estimates. 

Definition  6.4 

The  set  of  functioning  servers  of  object  A  in  state  S  are: 

ruMCs/A  =  ACT  s/a  U  K£Cs/a 
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6.2.1  Request  Ordering 

The  causal  consistency  condition  on  logs  guarantees  that  when  a  server  logs 
some  request,  y.B,  it  has  previously  logged  all  requests  (on  objects  with  replicas 
managed  by  the  server)  on  which  y.B  depends.  It  follows  then  that  if  a  server  logs 
request  x.A  after  request  y.B ,  then  request  y.B  cannot  be  dependent  on  request 

x. A.  Further,  if  a  server  of  objects  A  and  B  logs  y.B  without  logging  x.A,  then 
request  y.B  cannot  be  dependent  on  x.A. 

In  addition,  the  observable  consistency  condition  on  states  guarantees  that  if 
a  request,  y.B ,  is  reflected  in  the  active  state  of  an  object,  B,  then  any  request 
on  which  it  is  dependent,  x.A ,  is  reflected  in  the  active  state  of  its  object,  A 
(provided  object  A  is  active).  It  follows  that  if  both  objects  A  and  B  are  active, 
and  y.B  is  reflected  in  the  active  state  of  B  but  x.A  is  not  reflected  in  the  active 
state  of  A,  then  request  y.B  is  not  dependent  on  request  x.A. 

Combining  this  intuition  along  with  the  dependency  information  provided  by 
the  potential  dependency  relation,  we  can  estimate  when  two  requests  (x.A  and 

y. B)  are  not  related.  We  let  con 5(1. A  -<  y.B )  denote  this  basic  estimate. 

Definition  6.5 

Let  (71,  An)  be  a  request  structure,  let  *^*n  be  a  potential  dependency  relation 
consistent  with  (11,  An),  and  let  S  be  a  system  state  consistent  with  (71,  An)- 
The  request  ordering,  x.A  A  y.B,  is  directly  contradicted  in  state  S ,  denoted 
con^x.A  A  y.B),  if  any  of  the  following  four  conditions  holds: 

1.  A  "f*n  ^ 

2.  3  f  6  FUMCsiA^ttlAfCsiB  '•  x.A,y.B  £  Cgfj  A  y-B  —*s/f  x.A 
S.  3  /  €  TUNCs/a{)jFUNCs/b  :  y.B  €  £5//  A  x  -A  &  £5 / / 

4.  ACT s/ a  £  ®  A  s/b  ^  ®  A  x.A  &  ASs/a  A  y-B  €  ASs/b 


82 


This  estimate  has  the  property  that  it  is  sound.  When  an  ordering,  x.A  -<  y.B. 
is  found  to  be  directly  contradicted,  it  is  guaranteed  that  y.B  is  not  dependent 
on  x.A.  However,  if  the  ordering  is  not  found  to  be  contradicted,  the  requests 
may  or  may  not  be  ordered. 

Theorem  6.1 

For  any  request  structure  (7£,  potential  dependency  relation  con¬ 
sistent  with  (H,  -<*),  system  state  S  consistent  with  and  pair  of 

requests  x.A,  y.B  6  FI: 

con^x.A  -<  y.B)  ==>  x.A  -fin  y.B 


Proof:  The  proof  is  by  contradiction.  Suppose  that  requests  x.A  and  y.B 

are  related  ( x.A  -<n  y.B),  but  that  the  order  is  found  to  be  directly  contradicted 
(con|s(x.A  -<  y.B)). 

Because  the  order  is  directly  contradicted,  at  least  one  of  the  four  conditions 
in  the  estimate  definition  must  hold.  If  the  first  condition  holds  (A  B). 
then  the  potential  dependency  relation  is  inconsistent  with  (7£,  -<•£).  If  either 
the  second  or  third  condition  holds,  then  the  log  of  server  /  is  inconsistent  with 
(Tt,  -in).  Finally,  if  the  fourth  condition  holds,  then  the  system  state  is  observably 
inconsistent  with  (H,  -<*). 

In  either  case,  an  inconsistency  would  exist  in  the  system  (contradicting  the 
assumption  that  the  system  is  consistent)  and  so  the  theorem  assertion  must 
hold.  □ 

As  an  example,  consider  the  system  shown  in  figure  6.2.  Depicted  are  the  logs 
of  two  servers,  /  and  g,  along  with  a  potential  dependency  relation.  Server  / 
manages  replicas  of  objects  A  and  B,  while  server  g  manages  replicas  of  objects 
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A  ^  B 
C  ^  ^  A 
A  C 

Server  /  Server  ^ 

(AyB)  (A,C) 

Figure  6.2:  An  example  of  direct  contradiction 


x.A 

y.B 

w.A 

w.A 

X.A 

Table  6.1:  Directly  Contradicted  Request  Orderings 


Condition  1 

Condition  2 

Condition  3 

y.B  -<  w.A 

y.B  x.A 

y.B  -<  z.C 

x.A  ~<  w.A 

w.A  ■<  x.A 

y.B  -<  x.A 

w.A  -<  y.B 

z.C  -<  w.A 

z.C  X  X.A 

A  and  C.  Suppose  that  in  addition  to  those  requests  present  in  the  logs,  the 
system  also  contains  a  fourth  request,  z.C,  on  object  C.  Table  6.1  summarizes 
the  request  orderings  that  are  directly  contradicted  by  this  system,  if  all  objects 
are  inactive.  The  orderings  are  broken  down  according  to  the  conditions  of  the 
estimate  definition  that  caused  them  to  be  contradicted.  Note  that  the  following 
orderings  are  not  directly  contradicted  anywhere  in  the  system: 

w.A  -<  z.C  x.A  -<  y.B  x.A  -<  z.C  z.C  -<  y.B 

6.2.2  Dependency  Set 

Using  the  preceding  estimate,  we  can  now  construct  an  estimate  of  T>£V b{ x.A), 
the  object  B  dependents  of  request  x.A.  Again,  this  estimate  is  based  on  the 


consistency  restrictions  placed  on  logs  and  system  states. 

From  the  causal  consistency  condition  on  logs,  we  know  that  if  a  server  of 
objects  A  and  B  logs  request  x.A ,  then  it  previously  has  logged  all  of  the  object 
B  dependents  of  x.A.  The  set  of  object  B  requests  preceding  x.A  in  a  log  can 
therefore  be  used  as  an  estimate  of  the  true  set  of  dependents.  From  the  ob¬ 
servable  consistency  condition  on  system  states,  we  know  that  if  both  objects  A 
and  B  are  active,  and  the  active  state  of  A  reflects  request  x.A ,  then  the  active 
state  of  B  must  reflect  all  of  the  object  B  dependents  of  x.A.  In  this  case,  the 
set  of  requests  in  the  active  state  of  B  can  also  be  used  as  an  estimate  of  the 
dependency  set. 

Of  course,  not  all  of  the  object  B  requests  in  these  estimates  may  be  de¬ 
pendents  of  x.A.  There  may  be  information  in  the  system  that  contradicts  the 
ordering  between  x.A  and  some  of  the  object  B  requests.  This  information  can 
be  used  to  further  refine  the  estimates. 


Definition  6.6 

Let  (71,  -<■%)  be  a  request  structure,  let  be  a  potential  dependency  rela¬ 
tion  consistent  with  (71,  -<£),  and  let  S  be  a  system  state  observably  consis¬ 
tent  with  (71,  -<k).  For  any  object  B  €  OBJS  and  request  x.A  6  71,  the 
basic  estimated  dependents  of  x.A  are: 

±  if  -i 3f  €  lFUAfCs/AC\FHAfCs/B  ■  x.A  €  Cs ^ 

and 

ACT  si  a  =  0  V  ACT  s/ B  =  0  V  x.A  g  AS  si  A 
dep s/b(x  A)  —  '  ®  */  A 

{ y.B  |  ~>con°s(y.B  -<  x.A)  A  °-w- 

[  3/  €  mMCsiAClFUXCsiB  ■  x.A,y.B  €  Cs/f 

V  y-B  €  ASS/b  ]  } 
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Like  the  first  basic  estimate,  the  dependency  set  estimate  has  the  proper tv 
that  it  is  sound. 

Theorem  6.2 

Let  (fR.,  •<■%)  be  any  request  structure,  be  any  potential  dependency  relation 
consistent  with  (7Z,  and  S  be  any  system  state  observably  consistent  with 
(H,  -<n)-  For  any  request  x.A  €  H  and  object  B  6  OBJS,  if  depQs,B(x.A)  is 
defined  then: 

V£Vb(x.A)  C  dep|/fl(i.^) 


Proof:  The  proof  is  by  contradiction.  Suppose  that  dep5^B(i./t)  is  defined, 

but  that  there  exists  some  dependent,  y.B ,  of  request  x.A  that  is  not  included  in 

y.B  €  VSVb{x.A)  y.B  $  dep£/B(x.A) 

There  are  three  conditions  under  which  depPS/B(x.A)  is  defined: 

Case  1:  B  7^  A 

In  this  case,  the  potential  dependency  relation  does  not  reflect  the  real  de¬ 
pendency  between  x.A  and  y.B ,  and  so  is  inconsistent  with  the  request  struc¬ 
ture  of  the  application.  This  contradicts  the  assumption  that  the  potential 
dependency  relation  is  consistent. 

Caae-2:  B^^A  A  3  f  €  FUffCs/AC\FUffCs/B '.  x.a6£5^ 

Because  the  log  of  server  /  contains  request  x.A ,  and  because  the  state  of  the 
system  is  causally  consistent,  the  log  of  server  /  must  also  contain  request 
y.B. 

y.B  €  C$m 
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From  the  definition  of  the  dependency  set  estimate,  the  only  reason  y.B 
could  then  be  omitted  from  the  estimate  is  because  the  ordering  between  it 
and  request  x.A  is  directly  contradicted  somewhere  in  the  system. 

con°s(y.B  -<  x.A )  =  true 

However,  from  theorem  6.1,  this  implies  that  the  two  requests  tire  unrelated. 

y.B  x.A 

This  contradicts  the  assumption  that  y.B  is  a  real  dependent  of  x.A. 

Case  3:  B  A  A  -ACT s/a  ^  ®  A  -ACT S/b  ^  ®  A  x.A  6  AS$/a 

Because  both  objects  A  and  B  are  active,  and  the  active  state  of  A  reflects 

x. A ,  and  because  the  system  state  is  observably  consistent,  the  active  state 
of  B  must  reflect  all  of  the  object  B  dependents  of  request  x.A.  including 

y. B. 

y.B  €  -ASs/b 

From  the  definition  of  be  dependency  set  estimate,  the  only  reason  y.B 
could  then  be  omitted  from  the  estimate  is  because  the  ordering  between  it 
and  request  x.A  is  directly  contradicted  somewhere  in  the  system. 

con^y.f?  ■<  x.A)  =  true 

However,  from  theorem  6.1,  this  implies  that  the  two  requests  are  unrelated. 

y.B  /n  x.A 

This  contradicts  the  assumption  that  y.B  is  a  real  dependent  of  x.A. 

In  either  case,  a  contradiction  occurs  and  so  the  original  assumption  must  be 
incorrect.  The  estimate  must  therefore  always  include  all  true  dependents  when 


87 


x.A 


ys 

W.A 


Server  / 
(A  B) 


A  n  B 
C  A 

A  C 

Server  g 
(A,C) 


W.A 


z.C 


x.A 


Figure  6.3:  A  example  of  basic  dependency  set  estimation 


Table  6.2:  Basic  Estimated  Dependents 


w.A 

x.A 

y.B 

z.C 

A 

0 

0 

x.A 

w.A 

B 

0 

0 

0 

0 

C 

0 

z  *C 

1 

0 

defined.  □ 

As  an  example,  consider  the  system  shown  in  figure  6.3.  This  system  is 
identical  to  the  system  shown  in  figure  6.2,  except  that  server  g  has  logged  request 
z.C  between  requests  w.A  and  x.A.  For  each  request  in  the  system,  table  6.2  shows 
the  basic  estimated  dependents  on  objects  A,  B,  and  C. 

6.3  Compound  Estimates 

Requests  are  not  always  directly  related.  Two  requests,  x\.A\  and  xn.A„,  can  be 
related  through  a  sequence  of  dependencies  on  other  requests  in  the  system. 


xi-Ai  -<tt  Xj.Ai  -<•*  ...  -<*  Xn.A, 


88 


x.A 

y.B 

x.A 

z.C 

y.B 

z.C 

z.C 

X.A 

A  "v*-*  ^  S 

Server  f\ 

Server  fi 

Server  /j 

Server  fi 

B  C 

(A,B) 

(B,C) 

(A,C) 

(A,C) 

Figure  6.4:  Non-optimal  transitive  closure 

The  information  necessary  to  detect  these  transitive  dependencies  may  be  em¬ 
bedded  across  multiple  logs  in  the  system.  For  example,  the  above  transitive 
dependency  might  embed  itself  across  n  —  1  logs. 


The  compound  estimates  combine  the  results  of  the  basic  estimates  in  order 
to  detect  such  transitive  dependencies.  By  combining  the  results  of  the  basic 
estimates,  the  compound  estimates  are  able  to  approximate  the  sequences  out  of 
wi  :  'h  the  transitive  dependencies  are  built. 

An  obvious  method  for  estimating  transitive  dependencies  is  to  simply  take 
the  transitive  closure  of  the  basic  estimates.  This  method  is  not  entirely  accurate, 
however.  For  example,  consider  the  system  shown  in  figure  6.4.  This  figure 
depicts  a  system  with  four  servers  (/i,  fa,  /j,  and  /<),  three  objects  (A,  B,  and 
C),  and  three  requests  (x.A,  y.B,  and  z.C).  Applying  the  basic  estimates,  we 
determine  that  two  orderings  are  possible: 

x.A  -<i i  y.B  y.B  -<*  z.C 

By  taking  the  transitive  closure,  we  would  also  estimate  that  request  z.C  is  depen¬ 
dent  on  request  x.A,  even  though  the  logs  of  servers  /a  and  f\  directly  contradict 
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any  ordering  between  the  two  requests.  The  compound  estimates  presented  in 
this  section  detect  contradictions,  such  as  the  one  between  requests  z.A  and  x.A. 
and  use  them  to  form  more  accurate  approximations  when  combining  the  basic 
estimates. 

We  refer  to  the  sequence  of  objects  over  which  a  transitive  dependency  may 
be  embedded  as  a  chain. 

Definition  6.7 

A  chain.  H ,  is  a  sequence  of  potentially  dependent  objects. 

H  =  A\  A2  A„ 


Definition  6.8 

A  sub-chain  of  a  chain,  H, 

H  =  A\  A 2  An 

is  any  subsequence  of  its  objects 

H'  =  Ami  Amj 

where  1  <  mi  <  mj  <  . . .  <  mv  <  n. 


Definition  6.9 

The  AjAj  sub-chain  of  a  chain,  H,  is  the  sub-chain  of  objects  from  A,  to  Aj: 


Hi.j  =  A{  Ai+i  ...  Aj 
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Definition  6.10 

The  length  of  a  chain  or  sub-chain,  H ,  denoted  ||f/|j,  w  the  number  of  objects 
in  the  sequence. 


6.3.1  Dependency  Set 

In  this  subsection  we  present  our  compound  estimate  of  Z>£Tb(x.a),  the  object 
B  dependents  of  request  x.A,  which  we  denote  as  deps/£(x.A).  This  estimate 
is  constructed  by  estimating  the  object  B  dependents  of  x.A  that  occur  along 
each  chain  from  object  B  to  object  A,  and  then  combining  the  results  from  the 
different  chains. 

We  begin  by  describing  our  estimate  of  the  dependents  that  occur  along  a 
particular  chain,  H 


H  —  A\  Ai  An 

For  any  request,  xn-A„ ,  we  let  dep5^(x„.A„)  denote  our  estimate  in  state  5  of 
the  object  A\  dependents  of  xn.An  that  occur  along  chain  H.  This  estimate  can 
be  formed  in  many  ways,  depending  up  which  servers  are  functioning  in  state 
5.  First,  if  there  is  a  functioning  server  of  objects  A\  and  An  that  has  logged 
request  xn.An,  the  basic  estimate  can  be  applied  to  determine  the  dependency 
set.  In  general,  however,  the  server  sets  of  objects  A\  and  An  will  not  overlap, 
unless  the  objects  are  directly  related. 

Alternately,  an  estimate  can  be  formed  by  sub-dividing  the  problem  as  shown 
in  figure  6.5.  First,  an  object  in  the  chain,  A,-  (1  <  *  <  n),  is  selected.  Next,  the 
object  Ai  dependents  of  xn.A„  are  estimated.  Finally,  the  object  A\  dependents 
of  the  object  A,  dependents  are  estimated  to  produce  the  desired  dependency 
set.  Again,  if  the  server  sets  of  objects  A\  and  A,  overlap,  and  if  the  server  sets 
of  objects  Ai  and  A*  overlap,  the  basic  estimates  can  be  applied  to  solve  each 


of  the  sub-problems.  The  result  is  a  dependency  set  estimate  obtained  along  the 
sub-chain: 

A\  Ai  .4„ 

If  the  server  sets  do  not  overlap,  each  of  the  sub-problems  must  be  further  sub¬ 
divided  until  the  basic  estimates  can  be  applied.  In  general,  the  problem  is 
sub-divided  until  a  sub-chain  of  H  is  found 

A\  Amx  Amj  *4mP  An 

1  <  mi  <  m2  <  ...  <  mp  <  n 

if  which  each  pair  of  adjacent  objects  have  overlapping  server  sets. 

This  procedure  is  summarized  in  the  following  recursive  estimate  definition. 
Note  that  the  estimate  has  been  extended  to  operate  on  sets  of  requests.  In  par¬ 
ticular,  if  Q  is  a  set  of  object  An  requests,  then  dep5/^(Q)  denotes  the  estimated 
set  of  object  A\  dependents  of  the  requests  in  Q. 
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deP  S/h(Q) 


U  dep^  (^-Aj) 

*2^2€Q 

U  dePs/ff,  Jdep-£/H.  Jxn.An)) 

In-An€Q 


if  defined 


O.LC. 


where  1  <  i  <  n  is  chosen  so  that  the  estimates 
>  are  defined. 

Note  also  that  the  definitions  of  union  and  intersection  (intersection  is  used  later 
in  this  section)  must  be  altered  to  take  into  account  the  possibility  of  undefined 
sets. 


1 

if  3  i  : 

Si  =1 

gs,-  =  < 

t 

« 

o.w. 

’  i 

if  Vi  : 

Si  =1 

ii 

to 

C 

n  s,- 

• 

.  {<  1 

o.w. 

The  choice  of  object,  A,,  at  which  to  sub-divide  a  problem  can  affect  the  final 
estimate.  Different  object  choices  can  yield  slightly  different  approximations. 
When  an  estimate  is  defined,  though,  it  is  guaranteed  to  be  sound.  It  follows 
that  an  accurate  approximation  of  the  dependency  set  (one  with  few  extraneous 
requests)  can  be  formed  by  intersecting  the  estimates  from  each  of  the  different 
sub-division  choices.  The  complete  dependency  set  estimate  along  chain  H  is 
given  below. 
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Definition  6.11 

Let  (7Z,  -<■%)  be  a  request  structure,  let  be  a  potential  dependency  relate 
consistent  with  (1Z,  -<•£),  and  let  S  be  a  system  state  consistent  with  ( 

For  any  chain, 


H  =  A?  ■  •  •  ~^*n  -4n 


and  set  of  object  An  requests ,  Q,  the  estimated  dependents  of  Q  along  chain 
H  are: 


deP  s/h(Q) 


U  deps/.41(x2^2) 

\\H 

li-AjSQ 

U  [deps/Ai^n-'bO  n 

zn.A„eQ 

\\H 

[  fl  dePS/ff,  i(deP5/ff,  n(Jn..4n))  ]  ] 

1  < « <  r» 


Theorem  6.3 

When  it  is  defined,  dep s/fj(Q)  does  not  under- estimate  the  true  set  of  depen¬ 
dencies  along  chain  H . 

Proof:  Let  xn-An  denote  any  request  in  Q.  Suppose  that  dep s/h(Q)  defined 

and  that  the  system  contains  a  transitive  dependency  along  chain  H . 

Xj./ti  -<n  12-Aj  ...  -<K  xn.An 

We  show  by  induction  on  the  length  of  the  chain  that  dep5/#(Q)  contains  x i . .4 x . 

Bass. Cass;  ||#l|  =  2 

The  dependency  set  estimate  is  the  union  of  basic  estimates. 

U  dePSM,(x2-^) 

*3-4j€Q 

By  assumption  this  union  is  defined,  and  so  each  of  the  component  basic 
estimates  must  also  be  defined,  including  dep^/^^-^:)-  From  theorem  6.2. 


dePsM  ,(^2 -M)  contains  ail  object  A\  dependents  of  request  X2  M.  including 
xi.Ai.  It  follows  then  that  request  xi.Ai  is  included  in  the  union. 


Induction  Step:  J|HJ)  =  n  >  2 

Suppose  that  the  theorem  holds  for  all  chains  with  length  less  than  n . 

For  a  chain  of  length  n,  the  dependency  set  estimate  is  the  union  of  compo¬ 
nents,  each  of  which  in  turn  is  an  intersection  of  estimates.  We  show  that 
one  of  these  components,  specifically  the  one  shown  below,  contains  request 
xi. It  then  follows  that  the  overall  union  contains  x\.A\. 

depW*""4")  fl  (  fl  1 

1  <»<n 

In  order  to  show  that  this  component  contains  the  desired  request,  we  show 
that  each  element  in  the  intersection  (when  defined)  contains  the  .equcst. 
First,  consider  the  estimate  dep°S/Al(xn.An)-  From  theorem  6.2,  this  esti¬ 
mate  (when  defined)  contains  all  of  the  object  Ai  dependents  of  request 
Xn-An,  including  x\.Ai. 

Now,  consider  any  of  the  remaining  elements,  dep s/nl  ;(dep "s/h,  n(^n-A„)). 
that  is  defined.  By  the  induction  hypothesis,  n{*n-An )  contains  all 

of  the  object  Ai  dependents  of  request  xn.An  that  occur  along  chain  H,\  n- 
including  request  n.Ai.  Applying  the  induction  hypothesis  again,  we  see 
that  depg/j^  ^dep^#.  B(xn.A„))  contains  all  of  the  object  .4i  dependents 
of  x{.Ai  that  occur  along  chain  Hi..,,  including  xi.Ai. 

□ 


The  general  estimate  of  the  object  B  dependents  of  a  request,  x.A,  is  formed 
by  unioning  the  estimated  dependents  along  all  chains  from  B  to  A.  We  denote 
the  set  of  all  chains  from  object  B  to  object  A  as  BA-CHAIAfS. 
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Definition  6.12 

Let  (7 Z.  -<7i)  be  a  request  structure,  let  be  a  potential  dependency  relation 
consistent  with  (TZ,  -<•£),  and  let  S  be  a  system  state  consistent  with  (Tv.  -<r). 
For  any  object,  B,  and  request,  x.A,  the  estimated  object  B  dependents  of 
request  x.A  are: 

dep  s/b(x-a)  =  U  dePs/H(x'A) 

HeBA-CHAlAfS 

Theorem  6.4 

When  it  is  defined,  dep^^g(x.A)  does  not  under- estimate  the  true  set  of  de¬ 
pendents. 

VSV  b{x.A)  C  dep1~/B(x.A) 

Proof:  By  definition,  any  object  B  dependent,  y.B.  of  request  x.A  is  de¬ 

pendent  along  some  chain,  H ,  from  B  to  A.  From  theorem  6.3,  the  estimated 
dependents  along  chain  H  include  y.B.  It  follows  that  any  object  B  dependent 
of  x.A  is  included  in  the  union.  □ 

6.3.2  Request  Ordering 

Now  consider  the  problem  of  estimating  when  two  requests.  x.A  and  y.B.  are 
unrelated.  We  let  con^x.A  -<  y.B)  denote  our  compound  estimate  of  the  pred¬ 
icate  that  request  y.B  is  not  causally  dependent  on  request  x.A.  This  estimate 
is  constructed  in  a  manner  similar  to  the  preceding  compound  estimate.  First, 
the  relationship  of  the  two  requests  is  estimated  along  each  chain  from  object  A 
to  object  B.  The  results  of  the  estimates  are  then  combined  to  form  an  overall 
estimate  of  whether  the  two  requests  are  related. 


96 


We  let  con1j/ ff(x\..Ai  xn'An)  denote  our  estimate  of  the  predicate  that 

request  xn.An  is  not  causally  dependent  on  request  xi.Ai  along  chain  H. 

H  =  A\  A2  An 

The  idea  behind  the  construction  of  this  estimate  is  to  search  the  chain  for  an 
object,  Ai,  such  that  none  of  the  object  A;  dependents  of  xn.An  are  dependent 
on  xi.Ai.  The  existence  of  such  an  object  implies  that  request  xn.An  is  not 
transitively  dependent  on  request  xi.Ai  through  a  sequence  of  requests  on  objects 
that  include  A,\  Because  H  contains  A,-,  this  in  turn  implies  that  the  requests 
are  not  related  along  chain  H. 

The  estimate  is  formed  by  examining  each  object,  A,-,  in  the  chain.  For  each 
such  object,  the  dependents  of  request  xn.A„  are  estimated.  Each  of  these  de¬ 
pendents  is  then  recursively  tested  to  determine  if  they  are  dependent  on  request 
xi.Ai.  The  complete  estimate  definition  is  given  below.  Note  that  the  definition 
is  extended  to  operate  on  sets  of  requests.  In  particular,  if  Q  is  a  set  of  object 
An  requests,  then  con<f^(xi.A|  -<  Q)  denotes  our  estimate  of  the  predicate  that 
none  of  the  requests  in  Q  are  dependent  on  xi./t!  along  chain  H . 
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Definition  6.13 

Let  (71,  -<■%)  be  a  request  structure,  let  be  a  potential  dependency  relate:: 
consistent  with  (7Z,  and  let  S  be  a  system  state  consistent  with  (  JZ.  -<■%  i. 
For  any  chain, 


H  =  A\  At  ...  .4„ 

request  xi-Ai,  and  set  of  object  An  requests  Q,  the  dependency  of  Q  on  request 
xi.A!  along  chain  H  is  contradicted  in  state  S,  denoted  con^ H(x\.Ax  <  O). 
if  the  following  condition  holds. 

A  con|(xi  Ai  X  xi.Ai)  if  ||/f||  =  2 

A  [  con|(xi.Ai  X  xn.An)  V  o.w. 

Zr\-Ar\ 

(  V  cons/ff1..i(Ii-'4i  “<  dePs/Hi. 1  1 


Theorem  6.5 

If  con^i fj(x\.Ai  X  Q)  holds,  then  there  does  not  exist  any  request  in  Q  that 
is  dependent  on  xi.Ai  along  chain  H . 

Proof:  The  proof  is  by  contradiction.  Suppose  that  con§/H(x\.Ai  x  Q)  holds, 

but  that  there  exists  a  request,  xn.An,  in  Q  that  is  dependent  on  x  1 . .4 1  through 
a  sequence  of  dependencies  along  chain  H . 

zi-^i  x n  X2-M  x*  ...  xn.An 

We  show  by  induction  on  the  length  of  chain  H  that  an  inconsistency  exists. 

Base  Case:  ||#||  =  2 

Because  request  x„.An  is  dependent  on  request  xj.Ai  (xi.Ai  x#  xn.An),  we 
know  from  theorem  6.1  that  con^xi.Ai  X  xn.An)  is  false.  Because  this  is 


one  of  the  conjuncts  in  the  definition  of  con^^xi .At  -<  Q ),  it  follows  that 
the  compound  estimate  is  false,  contradicting  the  assumption  that  it's  true. 

Induction  Step:  ||ffj|  =  n  >  2 

Suppose  that  the  theorem  holds  for  all  chains  with  length  less  than  n.  We 
show  that  the  conjunct,  corresponding  to  request  in  the  definition 

of  con^^(xi.4l  ~<  Q)  is  false.  It  ther.  follows  that  the  overall  compound 
estimate  is  false,  contradicting  the  the  a  tmption  that  the  estimate  is  true 

We  show  that  the  conjunct  is  false  by  showing  that  each  of  its  disjuncts  is 
false.  First,  from  theorem  6.1  we  know  that 

con^xi.Xi  X  xn.An)  =  false 

Now,  consider  any  of  the  disjuncts  cong^  {x\.Ai  -<  dep^^.  n(xn.An)) 
From  theorem  6.3,  we  know  that  when  it  is  defined  dep^/#.  n(x„.An)  con¬ 
tains  till  of  the  object  A%  dependents  of  xn./t„,  including  x,.At.  Because  x,  t, 
is  dependent  on  xi.Xi  (xi.dj  -<n  x,\.4i),  we  now  by  the  induction  hypothesis 
that 

conS/Hi  i(xi  ^i  ^  Xi-A*)  =  false 
It  therefore  follows  that 

con%/HlJxl.Al  -<  dep*s/H.Jxn.An))  =  false 

□ 


The  general  compound  estimate  of  the  relationship  between  two  requests,  x.A 
and  y.B,  is  formed  by  combining  the  estimates  of  the  requests’  relationship  along 
individual  chains. 
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Definition  6.14 

Let  (7Z,  -<■%)  be  any  request  structure,  let  be  a  potential  dependency  re¬ 
lation  consistent  with  (7Z,  -<r),  and  let  S  be  a  system  state  consistent  with 
[7Z,  ).  For  any  pair  of  requests,  x.A  and  y.B,  the  dependency  of  y.B  on  x..\ 

is  contradicted  in  state  S  if  con^x. A  ~<  y.B )  holds. 

con^x./i  -<  y.B)  =  f\  cons/ ff(x  A  ^  y-B) 

HZAB-CHAIAfS 


Theorem  6.6 

con^(i..4  -<  y.B)  does  not  under-estimate  the  true  set  of  related  requests. 

con s(x.A  •<  y.B)  ==>  x.A  ftn  y.B 


Proof:  We  show  the  contrapositive.  Suppose  that  request  y.B  is  causally 

dependent  on  request  x.A  ( x.A  -<£  y.B).  By  definition,  the  two  requests  are 
related  along  some  chain,  H,  from  object  A  to  object  B.  From  theorem  6.5.  we 
know  that 

con s/h(x-A  y-B )  =  false 

Because  this  is  one  of  the  conjuncts  in  the  definition  of  con1~{x.A  -<  y.B).  it 
follows  that 

con^x.X  ~<  y.B)  =  false 


□ 


6.3.3  Safety 

Our  last  compound  estimate  approximates  the  safety  predicate  SAlFSsix.A). 
Recall  that,  when  true,  the  safety  predicate  indicates  that  the  dependents  (on 


IOC 


active  objects)  of  request  x.A  are  reflected  m  their  objects'  current  states.  Like  the 
other  compound  estimates,  the  safety  estimate  is  formed  by  combining  estimates 
of  safety  along  individual  chains  that  lead  to  object  .4. 

For  any  request  in.An,  active  object  Ai  (ACT s/Ai  ^  ®)'  and  chain  H  from 
object  A\  to  object  An, 

H  —  A\  ■  •  •  '^*71  An 

we  let  safe5^(i„-^n)  denote  our  estimate  of  the  predicate  that  all  object  A\ 
dependents  of  request  xn-An  (along  chain  H)  are  reflected  in  the  active  state 
of  „4i.  One  method  for  constructing  this  estimate  is  to  approximate  the  object 
A\  dependents  of  request  xn.A„  (using  one  of  the  preceding  estimates)  and  then 
check  to  see  if  all  of  those  estimated  dependents  are  reflected  in  the  state  of 
A\.  However,  this  method  will  only  work  when  the  dependency  set  estimate  is 
defined. 

Another  method  for  constructing  the  estimate  is  to  examine  each  active  object 
.4,  ( ACT s/A{  ^  ®)  ‘n  the  chain,  estimate  the  object  ,4,  dependents  of  request 
xn.An,  and  then  check  to  see  if  all  of  these  dependents  are  reflected  in  the  active 
state  of  object  A,\  The  intuition  behind  this  method  is  that  if  xn  An  is  safe  along 
chain  H  then  all  of  its  object  A,-  dependents  are  also  safe  along  chain  H \  If  one 
of  these  object  A,'  dependents  were  unsafe,  then  it  would  not  be  reflected  in  the 
active  state  of  A,-,  because  the  state  of  A{  would  be  inconsistent  with  the  state 
of  A\. 
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Definition  6.15 

Let  (71,  -<k)  be  a  request  structure,  let  be  a  potential  dependency  relation 
consistent  with  (7Z,  -<■%),  and  let  S  be  a  system  state  consistent  with  (  R.  ^ 
For  any  request  xn.A„,  active  object  Ai  (ACT §/ ^  ^  $),  and  chain  H  from 
Ai  to  An, 

H  =  A\  ~^>Ti  A2  . . .  --*■%  .4n 

request  xn.An  is  estimated  to  be  safe  along  chain  H  in  state  S  if  the  predicate 
safes/ fj(xn. An)  holds. 

s&feS/H(xn-An)  =  3  i  :  [ACT S/A.  ^  0  deps/fj.  n(x„.An)  C  AS^/^} 

Theorem  6.7 

If  safe's/ H(xn.  An)  «  true,  then  all  object  A\  dependents  of  request  xn.An  along 
chain  H  are  reflected  in  the  active  state  of  object  .4], 

Proof:  The  proof  is  by  contradiction.  Suppose  that  safe's/  H(xn.An)  's  true, 

but  that  there  is  an  object  A\  request,  xj.Ai,  that  is  dependent  on  request  xn.A„ 
along  chain  H 

X\.Ai  X2-A,  <1  ...  -<n  Xn .An 
but  is  not  reflected  in  the  active  state  of  object  .4i. 

Xi-Ai  £  AS s/Ai 

Because  safe's/ ff(xn.A„)  is  true,  we  know  from  its  definition  that  there  exists 
some  active  object,  Ai,  in  the  chain  such  that 

dePs/ff,.. (*»•*")  £  ^SS/a, 

From  theorem  6.3,  we  know  that  dep s/^  0(xn.An)  contains  all  of  the  object 
.4,  dependents  of  xn.An  that  occur  along  chain  H ,  including  x,.A,.  It  therefore 
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follows  that  request  x,.-4,  is  reflected  in  the  active  state  of  object  At. 

Xi.Ai  € 

However,  request  Xi.Ai  is  also  dependent  on  request  x\.A\.  The  state  of  object 
/ij  (an  active  object)  therefore  reflects  a  request  (x,-.A,)  that  is  dependent  on 
an  object  .4i  request  that  is  not  reflected  in  that  object’s  active  state. 

The  state  of  the  system  is  therefore  observably  inconsistent,  contradicting  the 
assumption  that  it  is  observably  consistent.  □ 

The  general  estimate  of  the  safety  of  a  request,  x.A ,  is  constructed  by  com¬ 
bining  the  estimates  of  the  request’s  safety  along  all  chains  to  A  from  active 
objects. 

Definition  6.16 

Let  (7 Z,  An)  be  a  request  structure,  let  be  a  potential  dependency  relation 
consistent  with  (7Z,  An),  and  let  S  be  a  system  state  consistent  with  (7Z.  ~< /j ). 
A  request,  x.A,  is  estimated  to  be  safe  in  state  S  if  the  predicate  safe^x..-!) 
holds. 

safe^(x.A)  =  A  A  sa  tes/H(x.A) 

{  BtzOBJS  |  ACT  si  git*  }  H  €  BA-CHA1\'S 


Theorem  6.8 

If  saSe's(x.A)  holds,  then  request  x.A  is  safe  in  state  S. 

safes(x.A)  =>  SAlFSs(x.A) 


Proof:  We  show  the  contrapositive.  Suppose  that  request  x.A  is  unsafe  in 

state  5.  Then  request  x.A  is  dependent  on  some  other  request,  y.B,  on  an  active 


object  (ACT g/g  ^  0)  that  is  not  reflected  in  the  object's  active  state. 

y.B  £  ASs/B 

By  definition,  this  dependency  must  occur  along  some  chain,  H .  from  object  B  to 
object  A.  From  theorem  6.7,  the  predicate  safety H(x. A)  must  be  false.  Because 
this  is  one  of  the  conjuncts  in  the  definition  of  safe^x.A),  the  compound  safety 
estimate  must  also  be  false.  □ 

6.4  Using  the  Estimates 

Both  the  basic  and  compound  estimates  can  be  substituted  directly  into  the 
recovery  mechanism  as  shown  below.  Because  the  estimates  all  have  the  property 
that  they  are  sound,  they  can  be  used  in  place  of  the  values  of  CO,\’(x.A  -<  y.B). 
T>£P g(x.A),  and  SATSs(x.A)  without  modification  of  the  algorithms. 


COAf{x.A  X  y.B )  V€PB{*  a)  SATSs(x-A) 

Basic 

Compound 

con^(x.A  X  y.B)  dep%/B(x.A) 

con^x.A  X  y.B)  deps/g(x.A)  saf 'eg(x.A) 

The  compound  estimates  have  the  advantage  that  they  are  more  often  defined 
than  the  basic  estimates.  However,  the  basic  estimates  are  less  expensive  to 
compute. 

If  there  is  insufficient  information  in  the  system  to  form  an  estimate  required 
by  the  recovery  mechanism  (i.e.  the  estimate  is  undefined),  the  mechanism  must 
block  and  went  for  additional  servers  to  recover  and  provide  enough  information 
to  construct  the  estimate.  If  the  undefined  estimate  occurs  in  the  JOIN  phase 
of  recovery,  the  entire  recovery  sequence  must  block.  If  the  undefined  estimate 
occurs  in  the  ACTIVATE  phase,  then  only  the  activation  of  the  object  that 


104 


required  the  estimate  must  block;  the  recovery  mechanism  can  proceed  with  the 
activation  of  other  objects. 

6.5  Summary 

In  this  chapter  we  presented  several  methods  for  estimating  the  dependencies 
between  requests  when  explicit  dependency  information  is  not  available  in  the 
system.  The  estimates  were  divided  into  two  classes:  basic  estimates  and  com¬ 
pound  estimates.  The  basic  estimates  were  simple  estimates  designed  to  search 
the  orders  of  requests  in  servers’  logs  for  evidence  of  request  dependencies.  The 
compound  estimates  were  more  complex  estimates  designed  to  combine  the  re¬ 
sults  of  the  basic  estimates  in  order  to  detect  transitive  dependencies  embedded 
across  multiple  servers’  logs. 

Both  the  basic  and  compound  estimates  had  the  property  that  they  were 
sound.  Because  of  this,  the  estimates  could  be  used  directly  by  the  log  trans¬ 
formations  and  recovery  algorithms.  By  using  sound  estimates,  the  recovery 
mechanism  was  guaranteed  to  ensure  all  true  dependencies  between  requests, 
plus  possibly  a  few  extraneous  orderings.  However,  because  the  estimates  were 
sometimes  undefined,  the  recovery  mechanism  might  occasionally  need  to  block 
and  wait  until  sufficient  ordering  information  is  available  in  the  logs  of  functioning 
servers  to  construct  the  needed  estimates. 

In  order  to  construct  the  estimates,  we  assumed  that  we  were  given  an  ap¬ 
proximation  of  the  dependencies  between  objects,  .4  B ,  called  a  potential 

dependency  relation.  This  relation  had  the  property  that  it  related  all  objects 
that  had  dependent  requests.  The  relation  was  not  required  to  be  precise,  how¬ 
ever.  It  could  relate  objects  between  which  no  dependencies  existed.  However, 
inaccuracies  in  a  potential  dependency  relation  caused  unnecessary  restrictions  to 
be  placed  on  the  structure  of  the  system.  They  also  caused  undefined  estimates 
to  occur  more  often. 


Chapter  7 

Efficiency  Issues 


In  this  chapter  we  examine  several  issues  regarding  the  efficiency  of  the  recovery 
mechanism.  We  begin  by  describing  a  cyclic  condition  that  can  arise  in  the 
dependency  estimates  and  cause  the  recovery  mechanism  to  block.  By  restricting 
the  structure  of  a  system,  we  show  how  this  cyclic  condition  can  be  avoided.  We 
then  describe  a  special  class  of  systems  that  can  be  recovered  efficiently  without 
blocking  using  only  the  basic  estimates.  Finally,  we  examine  the  problem  of  using 
checkpoints  (of  object  states)  in  the  recovery  mechanism  in  order  to  bound  the 
size  of  logs. 


7.1  Cycle  Restriction 


Even  though  the  dependencies  between  requests  form  a  partial  order,  the  esti¬ 
mates  sometimes  generate  cyclic  orderings.  Consider  the  three  logs  and  potential 
dependency  relation  shown  below. 


x.A 

y.B 

z.C 

y.B 

z.C 

X.A 

A  B 

B  C 

C  A 
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From  this  information,  the  dependency  estimates  would  generate  a  cyclic  ordering 
for  the  three  requests. 


x.A  -<  y.B  -<  z.C  -<  x.A 

At  least  one  of  the  estimated  request  dependencies  must  be  spurious.  However, 
based  on  the  information  available  to  the  estimates,  there  is  no  way  of  determining 
which  ordering  it  is. 

If  a  server  of  objects  A ,  5,  and  C  recovers  and  attempts  to  add  the  three 
requests  to  its  logs,  a  problem  occurs.  Without  knowing  which  request  ordering 
is  spurious,  any  ordering  of  the  three  requests  within  the  recovering  server's  log 
potentially  violates  a  true  dependency.  When  this  situation  arises,  the  recovering 
server  must  block  and  wait  until  another  (failed  server’s)  log  becomes  available 
and  is  able  to  contradict  one  of  the  cyclic  orderings. 

The  problem  of  estimated  cyclic  dependencies  can  be  avoided  by  requiring 
that  any  server  of  an  object  involved  in  a  potential  cycle  must  also  serve  all  other 
objects  in  that  cycle.  Such  a  restriction  can  be  easily  implemented  in  a  system, 
such  as  ISIS  [BCJ+],  that  provides  flexibility  about  which  objects  a  given  server 
manages. 

Cycle  Restriction 

If  a  cycle  exists  in  the  potential  dependency  relation 

A\  A2  ...  An  -4i 

then  any  server  that  manages  one  object  in  the  cycle  manages  all  objects  in 
the  cycle. 

senvAl  =  ssnvAi  =  ...  =  ssnvAn 

A  request,  such  as  x.A  above,  cannot  then  be  involved  in  an  estimated  depen¬ 
dency  cycle  because  any  server  that  logged  x.A  would  also  have  logged  all  of 
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its  dependents  along  the  cycle  ( y.B  and  z.A )  in  some  total  order  within  its  log. 
contradicting  at  least  one  of  the  cyclic  orderings. 

7.2  Backward  Inclusion  Systems 

In  general,  the  compound  estimates  of  chapter  G  are  fairly  expensive  to  com¬ 
pute.  In  order  to  form  a  dependency  estimate  along  a  particular  chain.  H.  the 
compound  estimates  combine  approximations  constructed  along  all  sub-chains 
(sub-divisions)  of  H.  Because  the  number  of  sub-chains  of  a  chain  grows  expo¬ 
nentially  with  the  length  of  the  chain,  this  method  can  be  prohibitively  expensive 
for  even  modestly  sized  chains.  This  cost  can  be  reduced  by  employing  dynamic 
programming  techniques  [Den82].  However,  for  long  chains,  dynamic  program¬ 
ming  solutions  can  also  be  expensive 

Another  method  for  reducing  the  cost  of  constructing  an  estimate  is  to  limit 
the  lengths  of  the  sub-chains  considered  by  the  estimation  method  to  a  fixed 
maximum  length.  This  has  the  effect  of  reducing  the  number  of  sub-chains  along 
which  estimates  are  computed  to  be  polynomial  in  the  length  of  the  chain.  Of 
course,  limiting  the  number  of  sub-chains  considered  by  the  estimation  method 
increases  the  likelihood  that  an  estimate  will  be  undefined. 

In  the  extreme,  we  can  limit  the  estimation  method  to  consider  only  sub¬ 
chains  of  length  two;  that  is,  we  can  limit  the  recovery  mechanism  to  using  only 
the  basic  estimates.  The  basic  estimates  have  the  advantage  that  they  are  the 
least  expensive  estimates  to  compute,  but  the  disadvantage  that  they  are  the  most 
likely  estimates  to  be  undefined.  However,  there  is  a  special  class  of  systems  in 
which  the  basic  estimates  are  always  defined. 

Definition  7.1 

A  system  is  a  backward  inclusion  system  if  it  satisfies  the  following  condition: 
V  A, Be  OBJS  :  =>  SSRVb  Q  S£HV a 


Intuitively,  a  system  is  a  backward  inclusion  system  if  any  server  that  manages 
a  replica  of  an  object,  A,  also  manages  replicas  of  all  objects  on  which  .-4  is 
potentially  dependent.  It  follows  that  if  a  server  logs  some  request.  x.A. 
then  it  also  logs  everv  dependent  of  x.A.  Because  a  request  never  occurs  in  a 
log  without  all  of  its  dependents,  the  basic  estimates  axe  always  defined  and  the 
recovery  mechanism  never  aborts.  Note  that  backward  inclusion  systems  satisfy 
the  cycle  restriction  and  so  never  abort  due  to  cyclic  dependency  conditions. 

The  class  of  backward  inclusion  systems  consists  essentially  of  hierarchically 
organized  systems  such  as  the  one  depicted  in  figure  7.1.  Figure  7.1(a)  sh^ws 
the  potential  dependency  relation  between  the  six  objects  in  the  system  and 
figure  7.1(b)  shows  the  overlap  between  the  server  sets  of  the  six  objects.  The  set 
of  backward  inclusion  systems  also  includes  some  non- hierarchical  systems  such 
as  the  one  depicted  in  figure  7.2. 

7.3  Checi  minting 

As  we  have  preset.  .  them,  logs  grow  without  bound.  In  any  implementation  of 
the  recovery  mechanism,  the  growth  of  logs  must  be  limited  through  the  use  of 
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Sr7 


Figure  7.2:  A  non-hierarchical  backward  inclusion  system 


checkpoints.  A  checkpoint  can  be  logically  modeled  as  a  set  of  requests. 
Definition  7.2 

The  checkpoint  of  object  A  in  state  S  at  server  f,  denoted  CKVT is  a  set 
of  causally  consistent  requests  on  object  A. 


V  x'.A,  x.A  €  H  ( x'.A  -<n  x.A)  :  x.A  €  CKVT 5/ 


x'.A  6  CKVT^/j 


In  reality,  the  checkpoint  stored  by  a  server  is  not  a  set  of  requests,  but  a 
compact  representation  of  the  object  state  corresponding  to  that  set  of  updates. 
However,  for  the  purposes  of  discussion,  we  choose  to  model  a  checkpoint  as  a 
set  of  requests. 

A  recovering  server  restores  its  replica  of  an  object,  .4,  from  its  log  by  first 
restoring  the  replica  to  the  checkpointed  state  and  then  replaying  the  logged 
requests  on  object  A.  In  order  to  ensure  that  only  consistent  states  are  restored 
to  replicas,  the  causality  condition  on  logs  is  extended  to  include  checkpoints. 
First,  the  checkpoints  and  log  of  a  server  are  restricted  to  contain  only  requests 
on  objects  managed  by  the  server.  Second,  if  a  server  logs  or  checkpoints  some 
request,  x.A ,  then  it  must  previously  have  logged  or  checkpointed  all  dependents 
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of  x.A  (on  objects  managed  by  the  server).  Because  checkpoints  precede  all  other 
entries  in  a  log,  this  implies  that  a  server  that  hats  checkpointed  x.A  has  also 
checkpointed  the  dependents  of  x.A.  Lastly,  the  checkpoints  and  log  of  a  server 
are  restricted  from  containing  any  duplicate  requests. 

Definition  7.3 

The  log,  (CS/f,  ~*S/f )'  °f  a  server  f  state  S  is  consistent  with  a  request 
structure,  ('ll,  -<n),  if 

1.  V  x.A  e  Cs/f  :  A  6  OBJSf 

V  A  €  OBJSf  :  CKVTA/f  contains  only  object  A  requests 

2.  V  x.A  €  Cs/f  :  V  y.B  €  Tl  ( y.B  x.A )  : 

Be  OBJSf  ==> 

\y.BZCKVT%,,  V(M€  cs/f  A  y-B  -+s/f  x.a)  ] 

3.  V  A,  Be  OBJSf  : 

V  x.A  e  CKVTs/f  : 

V  y.B  e  V,  ( y.B  -<*  x.A )  :  y.B  €  OCVT§/f 
4-  V  Ae  OBJSf.  CKVTg/f  fl  Cs/f  =  0 

The  projection  operator  is  also  extended  to  account  for  checkpoints  in  the  fol¬ 
lowing  way: 

Definition  7.4 

The  projection  of  a  log,  (£s/f'—¥S/f)’  on*°  an  °bject>  A  €  OBJS,  is 
{£s/f'~*S/f)  U  =  {  x.A  |  x.A  €  Cs/f  V  x  A  ^  CKVTgjf  } 

The  main  difficulty  involved  in  implementing  checkpoints  is  ensuring  that  the 
causal  consistency  restrictions  are  not  violated.  For  example,  the  log  addition 
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transformation  must  be  careful  not  to  add  to  a  server's  log  any  request  that  is 
already  present  in  that  server's  checkpoints.  Similarly,  a  checkpoint  should  never 
be  installed  at  a  server  if  that  checkpoint  reflects  a  request  already  present  in 
the  server's  log  (this  can  be  a  problem  when  a  new  checkpoint  is  transferred  to 
a  recovering  server  during  the  server’s  JOIN  phase). 

These  problems  can  be  solved  by  storing,  with  each  checkpoint,  explicit  in¬ 
formation  about  the  requests  it  reflects.  Duplicates  can  then  be  detected  and 
removed  from  the  affected  log.  Due  to  the  large  number  of  requests  that  may  be 
reflected  in  a  checkpoint,  however,  it  will  generally  be  impractical  to  maintain 
such  explicit  information. 

Another  method  for  avoiding  duplicates  is  to  use  implicit  information  con¬ 
tained  in  other  servers’  logs.  For  example,  if  a  server,  /,  known  to  be  consistent, 
has  logged  some  request,  x.A,  then  the  checkpoint  of  object  A  at  server  /  cannot 
reflect  x.A.  It  therefore  follows  that  request  x.A  can  be  added  to  the  log  of  any 
server,  with  the  same  object  A  checkpoint  as  /,  without  introducing  a  dupli¬ 
cate  into  its  log.  By  adapting  a  checkpointing  algorithm  such  as  [KT87],  we  can 
increase  the  likelihood  that  servers  will  have  identical  checkpoints. 

7.4  Summary 

In  this  chapter  we  examined  several  issues  concerning  the  efficiency  of  the  recov¬ 
ery  mechanism.  We  began  by  describing  a  circularity  condition  that  can  arise 
in  the  estimates  and  cause  the  recovery  mechanism  to  abort.  We  showed  how 
this  problem  could  be  avoided  by  restricting  the  structure  of  the  system.  We 
then  outlined  a  special  class  of  systems,  called  backward  inclusion  systems .  that 
were  efficiently  solvable  without  blocking  using  the  basic  estimates.  Finally,  we 
outlined  some  of  the  problems  involved  in  adding  object  checkpoints  to  server 
logs. 


Chapter  8 

Grouping  Consistency 


This  dissertation  has  presented  a  recovery  mechanism  for  preserving  causal  con¬ 
sistency  in  a  distributed  system.  The  basic  principles  of  estimating  dependencies 
between  requests  and  using  those  estimates  to  preserve  consistency  can  also  be 
applied  to  other  forms  of  consistency.  In  this  chapter  we  outline  changes  in  the 
recovery  mechanism  for  supporting  an  atomic  form  of  consistency  called  grouping 
consistency. 

8.1  Grouping  Consistency 

Under  grouping  consistency,  requests  may  be  collected  into  sets  (called  groups) 
with  the  property  that  no  request  in  a  group  is  reflected  in  the  system  unless  all  of 
the  requests  in  the  group  are  also  reflected.  The  requests  in  a  group  do  not  have 
any  ordering  properties  between  them,  only  the  all-or-none  property.  Grouping 
consistency  differs  from  serializability  in  that  there  are  no  ordering  properties 
between  the  requests  in  different  groups;  they  may  be  received  and  processed  by 
servers  in  any  order. 

As  an  example  of  grouping  consistency,  consider  an  airline  reservation  system. 
Suppose  that  a  passenger  wishes  to  make  a  reservation  on  a  pair  of  connecting 
flights.  This  operation  can  be  implemented  as  two  separate  requests.  First,  a  seat 
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Request  Structure:  (1Z,=n) 

Tt  =  {resi./l,  res^.B,  res^.A,  re.s4.fi} 
res\.A  =n  resi.B  res$.A  =n  res^.B 

Figure  8.1:  A  grouping  request  structure 


is  reserved  for  the  passenger  on  the  first  flight,  A.  Second,  a  seat  is  reserved  for 
the  passenger  on  the  connecting  flight,  B.  In  order  to  be  consistent,  the  system 
should  never  reflect  one  seat  reservation  without  reflecting  the  other.  The  two 
reservations  would  therefore  be  collected  into  a  group  and  submitted  as  a  unit. 

We  can  modify  the  definition  of  a  request  structure  to  reflect  groupings  of 
requests  in  the  following  way. 

Definition  8.1 

A  request  structure,  is  a  set  of  requests  along  with  an  equivalence 

relation  on  that  set. 

Here,  V.  is  the  set  of  client  requests  and  relates  all  grouped  requests.  If  two 
requests  are  related,  x.A  —n  y.B ,  then  the  system  must  reflect  both  requests  or 
neither  request.  Note  that  a  request  may  belong  to  multiple  groups.  If  request 
x.A  is  grouped  with  request  y.B  ( x.A  =n  y.B),  and  request  y.B  is  separately 
grouped  with  request  z.C  ( y.B  =n  z.C ),  then  by  the  transitivity  of  the  grouping 
relation  request  x.A  cannot  be  reflected  in  the  system  unless  request  z.C  is  also 
reflected. 

Figure  8.1  shows  a  request  structure  for  the  airline  reservation  system  de¬ 
scribed  above.  The  system  consists  of  four  seat  reservations  ( res\.A ,  re.s2.fi. 
res^.A,  and  res4.fi)  on  two  separate  flights  (A  and  B).  In  the  example,  res-i-B  is 
a  connecting  reservation  from  resi-A  and  res4.fi  is  a  connecting  reservation  from 


resz.A. 
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We  assume  servers  receive,  process,  and  log  grouped  requests  as  a  unit.  As  a 
result,  server  logs  are  consistent  with  the  group  structure  on  requests.  That  is. 
if  the  log  of  a  server  reflects  some  request,  x.A,  then  it  also  reflects  all  requests 
related  to  x.A  (on  objects  managed  by  the  server). 

Definition  8.2 

The  log,  (£s/f'~*S/f)’  °f  server  f  tn  state  S  is  consistent  with,  a  request 
structure,  (7£,  =ft),  if 

1.  V  x.A  €  Csff  :  /  €  S£71Va 

2.  Vx.A6£s//: 

V  y.B  €  H  ( x.A  =*  y.B)  :  /  €  S£KV  B  =»  y.B  6  Csjf 


As  before,  we  assume  that  servers  recover  in  observably  consistent  states. 
That  is,  at  the  time  of  a  server  recovery,  the  logs  of  all  functioning  servers  are 
consistent  with  the  application’s  request  structure  and  all  active  servers  of  an 
object  reflect  the  same  object  state.  Further,  the  states  of  different  active  objects 
are  mutually  consistent:  if  a  request  is  reflected  in  the  active  state  of  one  object, 
then  all  of  its  dependents  (on  active  objects)  are  reflected  in  their  object's  active 
states. 

Definition  8.3 

A  system  state,  S,  is  observably  consistent  with  a  request  structure,  (H,  =n), 

if 

1.  V  /  €  SSKV  -JAICs  :  (£s//,-*s//)  is  consistent  with  (7 Z,=n). 

2. VAS  OBJS  :  V  /,  g  g  ACTs/a  :  (Cs/f,  -s//)  U  =  Ks/,  ’ ~*s/g )  1-4 

s.  V  A,  B€  OBJS  ( ACTs/a  t  0  A  ACTS/B  ±  0)  : 

V  x.A  6  ASs/a  :  v  y.B  €  'll  (x.A  =n  y.B)  :  y.B  €  ASS/B 
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8.2  Changes  to  Recovery  Mechanism 

Recovery  under  grouping  consistency  is  handled  in  the  same  manner  as  it  was 
under  causal  consistency.  The  recovery  sequence  of  a  server  is  divided  into  two 
phases.  During  the  JOIN  phase,  a  recovering  server  receives  and  installs  the 
current  states  of  active  objects.  During  the  ACTIVATE  phase,  a  recovering 
server  constructs  and  installs  new  (consistent)  states  for  inactive  objects. 

The  algorithms  implementing  the  JOIN  and  ACTIVATE  phases  are  nearly 
identical  to  those  of  chapter  5.  However,  the  log  transformations  on  which  they 
are  built  must  be  modified  to  account  for  the  new  consistency  definition.  Consider 
the  log  addition  transformation.  When  a  request  is  added  to  a  server’s  log,  the 
transformation  must  be  certain  that  all  requests  (directly  or  transitively)  grouped 
with  it  are  also  present  in  the  log.  If  they  are  not,  then  the  transformation  must 
add  them. 

Definition  8.4 

The  set  of  object  B  dependents  of  request  x.A  under  grouping  consistency  are 

VSVb(x  a)  =  { y.B  €  ft  |  y-B  =■%  x..4} 

Figure  8.2  shows  the  complete  log  addition  transformation  under  grouping  consis¬ 
tency.  Note  that  the  transformation  places  no  particular  ordering  on  the  requests 
in  the  log  because  requests  are  not  ordered  under  grouping  consistency. 

The  deletion  transformation  is  modified  in  a  similar  manner.  When  a  request 
is  deleted  from  a  log,  all  requests  grouped  with  it  are  also  deleted.  The  complete 
log  deletion  transformation  is  shown  in  figure  8.3.  Note  that  although  the  trans¬ 
formation  preserves  the  order  of  requests  that  remain  in  the  log,  this  restriction 
is  unnecessary. 


add Q(Cf,->f)  -  (C,-+c) 

where 


£  =  £/U3U(U  U  vsvb(x.a)  1 

r.Ae<3  BeOBJSf 


—*C  is  any  ordering  of  the  requests. 

Figure  8.2:  Log  addition  under  grouping  consistency 


delete<?(£/,—  f)  =  {C,-*c) 
where 

C  =  {  x.A  6  Cf  |  x.a  &  Q  A  y-B  eQ  ■  y.B  =■%  x.a  } 
V  x.A, y.B  €  C  :  ( x.A  ->c  y  B )  <=>  (x.X  ->/  y.B) 


Figure  8.3:  Log  deletion  under  grouping  consistency 
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When  explicit  dependency  information  is  not  available  to  the  transformations, 
dependency  estimates  can  be  used  to  preserve  consistency.  The  changes  necessary 
to  use  estimates  in  the  log  transformations  are  left  to  the  reader. 

8.3  Estimating  Dependencies 

Our  estimates  of  request  groupings  are  divided  into  two  classes:  basic  and  com¬ 
pound.  As  before,  the  compound  estimates  are  more  accurate  and  more  often 
defined  than  the  basic  estimates,  but  are  also  more  expensive  to  compute.  How¬ 
ever,  all  estimates  have  the  property  that  they  do  not  under-estimate  the  true 
set  of  grouped  requests.  That  is,  all  of  the  estimates  are  sound. 

We  assume  that  the  estimates  have  access  to  a  potential  dependency  relation 
that  relates  pairs  of  potentially  dependent  objects.  Like  the  potential  dependency 
relation  under  causal  consistency,  this  relation  should  not  under-estimate  the  true 
set  of  related  objects. 

Definition  8.5 

A  potential  dependency  relation .  over  request  structure  (7 1,  =n),  is  a  bi¬ 
nary  relation  on  the  objects  in  OBJ S  with  the  property  that  it  relates  all  pairs 
of  objects  between  which  dependencies  hold. 

V  x.A,y.B  6  It  :  x.A=ny.B  ==»  A  B 

8.3.1  Basic  Estimates 

The  basic  estimates  are  designed  to  search  individual  server  logs  for  evidence  of 
request  groupings.  We  begin  by  presenting  an  estimate  of  when  two  requests  are 
not  grouped.  This  estimate  is  then  used  to  construct  an  estimate  of  the  complete 
set  of  (grouped)  dependents  of  a  request. 
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Consider  the  problem  of  estimating  when  two  requests,  x.A  and  y.B.  are  not 
grouped.  Because  server  logs  are  consistent  with  the  request  structure  of  an 
application,  we  know  that  the  requests  are  not  grouped  if  a  server  of  objects  A 
and  B  has  logged  one  request,  but  not  the  other.  Because  the  states  of  active 
objects  are  consistent  with  the  application’s  request  structure,  we  also  know  that 
x.A  and  y.B  axe  not  grouped  if  both  objects  are  active,  but  only  one  of  the 
requests  is  reflected  in  its  object’s  active  state.  Combining  these  observations 
with  the  knowledge  provided  by  the  potential  dependency  relation  we  derive  the 
following  estimate. 

Definition  8.6 

Let  (7Z,=n)  be  a  request  structure,  let  be  a  -potential  dependency  relation 
consistent  with  ('1£,  =ft),  and  let  S  be  a  system  state  consistent  with  (71,  =n). 
The  request  grouping,  x.A  —  y.B,  is  directly  contradicted  in  state  5,  denoted 
con^x.A  =  y.B),  if  any  of  the  following  three  conditions  holds: 

1.  A^-nB 

2.  3  /  €  MAfCsiAftFUtfCsiB  ■ 

[(x.x  €  c s/j  A  y-B  &  ^5//)  V  (y-B  g  £-s/f  A  x-a  &  £5//)] 

S.  ACT s/a^Q  A  ACT s/b  9^  ®  A 

[(x.a  €  ASs/a  A  y-B  &  ASs/b )  V  ( y-B  €  ASs/b  A  x.a  &  AS5/4)] 

Now  consider  the  problem  of  estimating  the  complete  set  of  object  B  requests 
grouped  with  request  x.A.  If  a  server  of  objects  A  and  B  has  logged  request 
x.A ,  then  its  log  must  also  contains  all  of  the  object  B  dependents  of  x.A.  The 
set  of  object  B  requests  in  its  log  can  therefore  be  used  as  an  estimate  of  the 
dependency  set.  Additionally,  if  objects  A  and  B  are  both  active,  and  the  state 
of  A  reflects  request  x.A,  then  the  state  of  B  must  reflect  all  of  the  dependents. 
The  set  of  requests  reflected  in  the  state  of  B  can  therefore  also  be  used  as  an 
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estimate  of  the  dependency  set.  Combining  these  approximations  along  with 
the  information  in  the  preceding  estimate,  we  derive  the  following  estimate  of 
V£VB(x  .A). 


Definition  8.7 

Let  (7£,  =^)  be  a  request  structure,  let  %•£  be  a  potential  dependency  rela¬ 
tion  consistent  with  and  let  S  be  a  system  state  observably  consis¬ 

tent  with  (71, =n).  For  any  object  B  €  OBJS  and  request  x.A  G  R.  the 
basic  estimated  dependents  of  x.A  are: 


depi/B(x.A)  = 


0 


if  ->3/  e  TUAfCSjAf\TUMCsiB  '■  X-A^  ^s/f 

and 

ACT  s/ a  =  0  V  ACT  sib  =<d  \J  x.a$  AS  s/ a 
if  B#n  A 


{y.B  |  ->con °s(y.B  =  x.A)  A  o  w- 

[  3 /  €  TVAfCs/AClFU-^Cs/B  '■  x.A, y.B  G  Cs^ 

V  y-B  €  ASS/b  ]  } 


8.3.2  Compound  Estimates 


The  information  necessary  to  detect  a  request  grouping  may  be  distributed  across 
multiple  logs.  For  example,  suppose  that  there  is  a  grouping  between  n  different 
requests. 

X\.Ai  =%  X2-dj  =K  ...  =TL  Xn-An 


This  grouping  may  embed  itself  across  n  —  1  logs  in  the  following  way. 


x\  Ai 

xi. A, 

X2.Aj 

Xi  .A3 

Xn—  1-^n-l 


X  n  •  'tn 
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Using  the  basic  estimates,  we  would  detect  each  of  the  individual  grouping  pairs: 

=K  X2-M  X2-Ai  =■%  X^.As  ...  Xn-X-An.i  =K  Xn.An 

In  order  to  detect  the  overall  grouping  between  the  n  requests,  the  results  of 
the  basic  estimates  must  be  combined.  This  can  be  done  using  the  compound 
estimates  of  chapter  6.  By  substituting  the  preceding  basic  estimates  for  those  of 
chapter  6,  the  compound  estimates  will  approximate  request  groupings  instead 
of  causal  dependencies.  No  other  modifications  are  required  to  the  compound 
estimates. 

8.4  Summary 

This  chapter  outlined  modifications  to  the  recovery  mechanism  for  supporting  a 
new  form  of  consistency  called  grouping  consistency.  Under  grouping  consistency, 
requests  were  collected  into  sets  with  the  property  that  no  request  in  a  set  was 
reflected  in  the  system  unless  all  requests  in  the  set  were  reflected. 

The  recovery  sequence  of  a  server  remained  the  same  as  it  was  under  causal 
consistency.  During  the  JOIN  phase,  a  recovering  server  restored  its  replicas  of 
active  objects  to  those  objects’  current  states.  During  the  ACTIVATE  phase, 
a  recovering  server  restored  its  replicas  of  inactive  objects  to  states  consistent 
with  the  rest  of  the  system.  However,  the  log  transformations  out  of  which 
the  recovery  algorithms  are  built  had  to  be  modified  to  account  for  the  new 
consistency  definition. 

When  explicit  information  about  the  groupings  of  requests  was  unavailable, 
the  log  transformations  could  use  estimates  of  the  groupings  in  order  to  preserve 
consistency  in  the  system.  These  estimates  were  divided  into  two  classes:  basic 
and  compound.  The  compound  estimates  remained  the  same  as  they  were  in 
chapter  6.  However,  the  basic  estimates  out  of  which  they  are  built  were  redefined 
to  approximate  grouping  dependencies  instead  of  causal  dependencies. 


Chapter  9 
Conclusions 


This  dissertation  has  presented  a  recovery  mechanism  for  restoring  casually  con¬ 
sistent  states  to  replicated  data  objects.  The  mechanism  was  based  on  maintain¬ 
ing  logs  of  the  updates  that  occur  to  objects,  and  using  those  logs  to  reconstruct 
object  states  after  failures.  Unlike  existing  techniques,  our  method  does  not  re¬ 
quire  any  explicit  information  about  the  dependencies  between  updates.  Instead, 
any  necessary  information  about  the  ordering  between  requests  is  inferred  from 
their  orderings  within  logs. 

Without  a  recovery  mechanism,  two  types  of  inconsistencies  develop  in  a 
system.  First,  inconsistencies  develop  between  the  different  replicas  of  an  object. 
W'hen  a  server  of  a  replica  recovers  from  a  failure,  its  log  reflects  the  state  of 
the  object  from  the  time  of  the  failure.  If  the  state  of  the  object  has  changed 
since  the  failure,  the  server  will  restore  an  outdated  state  to  its  replica.  Second, 
inconsistencies  develop  between  the  states  of  different  objects.  When  all  servers  of 
an  object  fail,  some  updates  on  the  object  may  be  lost.  The  state  later  recovered 
by  the  servers  may  then  be  missing  some  requests  on  which  other  active  objects 
depend. 

Based  on  these  two  types  of  inconsistencies,  the  recovery  sequence  of  a  server 
is  divided  into  two  phases.  During  the  JOIN  phase,  a  recovering  server  restores 
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its  replicas  of  active  objects.  The  current  states  of  these  objects  are  transfer:'  1 
to  the  server  and  written  to  its  log.  During  the  ACTIVATE  phase,  a  server 
restores  its  replicas  of  inactive  objects.  All  recovering  servers  of  an  inactive 
object  cooperate  in  choosing  a  new  state  for  the  object  that  is  consistent  with 
the  states  of  the  other  objects  in  the  system.  Once  chosen,  the  servers  modify 
their  logs  to  reflect  this  new  state. 

The  algorithms  implementing  the  JOIN  and  ACTIVATE  phases  are  relatively 
straight  forward.  The  only  difficulty  involves  preserving  the  consistency  of  a 
server’s  log  when  modifications  are  made  to  it.  The  log  addition  transformation 
ensures  that  no  request  is  added  to  a  server’s  log  without  all  of  its  dependents. 
The  log  deletion  transformation  ensures  that  no  request  is  deleted  from  a  log 
without  also  removing  all  requests  that  depend  on  it. 

When  explicit  information  about  request  dependencies  is  not  available,  the  re¬ 
covery  algorithms  (as  well  as  the  log  transformations  out  of  which  they  are  built) 
can  use  estimates  of  the  dependencies.  In  order  to  preserve  consistency  in  the 
system,  these  estimates  must  have  the  property  that  they  do  not  under-estimate 
the  orderings  between  requests.  We  presented  several  dependency  estimates  with 
this  property.  The  basic  estimates  are  simple  approximations  based  on  search¬ 
ing  server  logs  for  evidence  of  request  orderings.  The  compound  estimates  are 
more  complicated  approximations  formed  by  combining  the  results  of  the  basic 
estimates.  Although  the  compound  estimates  are  more  accurate  and  more  of¬ 
ten  defined  than  the  basic  estimates,  they  are  also  more  expensive  to  compute. 
We  showed  that  in  a  special  class  of  systems  (the  backward  inclusions  systems) 
the  inexpensive  basic  estimates  can  always  be  used  without  the  possibility  of 
blocking. 

Our  basic  recovery  approach  can  also  be  applied  to  forms  of  consistency  other 
than  casual  consistency.  We  showed  that  with  little  modification,  our  recovery 
technique  could  be  applied  to  an  atomic  form  of  consistency  called  grouping 
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consistency.  Particularly  interesting  was  the  fact  that  the  compound  estimates 
remained  unchanged  between  causal  and  grouping  consistency.  Only  the  basic 
estimates  needed  to  be  changed  to  allow  for  the  new  consistency  definition. 

9.1  Future  Work 

We  conclude  this  dissertation  by  discussing  several  related  areas  for  future  re¬ 
search. 

9.1.1  Implementation  Considerations 

A  recovery  mechanism  based  on  the  ideas  in  this  dissertation  was  implemented 
in  the  ISIS  system  [BCJ+|.  In  ISIS,  the  server  set  of  an  object  is  implemented  as 
a  process  group.  Each  process  in  a  group  is  equivalent  to  one  server  and  manages 
one  replica  of  the  object.  Process  groups  in  ISIS  are  given  unique  names.  Updates 
on  an  object  can  be  broadcast  to  the  group  using  only  the  group  name.  When 
such  a  broadcast  occurs,  ISIS  automatically  resolves  the  name  of  the  group  into 
its  current  set  of  member  processes  and  delivers  a  copy  of  the  update  broadcast 
to  each  member. 

Unfortunately,  the  exact  recovery  mechanism  described  in  this  dissertation 
could  not  be  implemented  in  ISIS  because  of  the  way  in  which  ISIS  handles 
process  groups.  When  a  process  (server)  recovers  in  ISIS,  it  is  required  to  re-join 
the  process  groups  (object  server  sets)  that  it  previously  belonged  to  in  a  fixed 
order  that  is  set  at  the  time  the  application  is  written.  However,  the  recover 
sequence  presented  in  chapter  3  requires  a  recovering  server  to  join  object  groups 
in  flexible  orders.  When  a  server  recovers,  it  must  first  JOIN  the  server  sets  of 
all  objects  that  are  currently  active  (whatever  they  are)  and  then  ACTIVATE 
its  replicas  of  objects  that  are  inactive.  We  believe  that  ISIS  could  be  made 
to  support  processes  joining  process  groups  in  flexible  orders.  However,  the 
modifications  would  require  substantial  revision  of  the  code,  and  our  current 
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applications  ;  t  require  such  support. 

Like  the  re.  ,,  very  mechanism  described  in  this  dissertation,  the  recovery  mech¬ 
anism  in  ISIS  automaticaUy  ensures  consistency  between  the  replicas  of  an  object. 
However,  the  ISIS  recovery  mechanism  does  not  provide  automatic  consistency 
between  the  states  of  different  object.  Instead,  it  ensures  that  the  state  of  an 
inactive  object  is  always  recovered  using  the  log  of  the  last  server  of  the  object 
to  fail  [Ske85].  By  allowing  clients  to  force  certain  upda.es  to  be  logged  by  all 
functioning  servers  of  an  object,  clients  can  control  which  updates  may  be  lost 
from  the  system,  and  therefore  control  consistency  in  the  system. 

Beyond  the  ability  to  join  process  groups  in  flexible  orders,  ISIS  should  pro¬ 
vide  a  good  platform  on  which  to  build  the  recovery  mechanism  described  in 
this  dissertation.  ISIS  currently  supports  a  state  transfer  mechanism  whereby  a 
server  (process)  joining  or  re-joining  an  active  object  server  set  (process  group) 
is  automatically  t;  tsferred  the  current  state  of  the  object  (process  group).  This 
state  transfer  appears  atomic  from  the  point  of  view  of  a  client,  so  each  update 
broadcast  to  the  object  (process  group)  is  processed  by  all  of  its  members  in  the 
same  state  of  the  object  (process  group).  This  state  transfer  mechanism  is  used 
by  the  current  ISIS  recovery  mechanism  to  initialize  replicas  of  active  objects  at 
recovering  servers. 

The  ISIS  broadcast  mechanism  also  provides  a  facility  for  automatically  col¬ 
lecting  replies  to  message  broadcasts,  including  the  handling  of  failures  during  the 
broadcast-reply  sequence.  This  facility  should  prove  invaluable  in  the  dissemina¬ 
tion  and  collection  of  basic  dependency  information.  For  example,  a  recovering 
process  requiring  dependency  information  about  certain  updates  could  broadcast 
a  request  to  the  servers  of  the  objects  involved.  Upon  receiving  the  request,  the 
servers  could  reply  with  the  current  states  of  the  objects  and  ordering  informa¬ 
tion  from  their  logs.  Using  simple  ur  ns  and  intersections,  the  recovering  process 
could  then  combine  this  information  >  form  the  necessary  estimates.  This  type 
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of  mechanism  would  be  sufficient  for  building  backward  inclusion  systems,  where 
only  basic  dependency  information  is  required. 

This  technique  could  also  be  used  to  compute  the  compound  estimates.  How¬ 
ever,  doing  so  would  be  costly,  not  only  in  terms  of  time,  but  also  in  terms  of 
space  and  message  traffic.  In  order  to  form  the  compound  estimates  needed  for 
recovery,  a  server  must  collect  basic  estimates  from  the  logs  of  many  different 
servers.  This  collection  process  can  potentially  create  a  large  load  of  message 
traffic  at  the  recovering  server.  Further,  once  the  basic  estimates  are  collected, 
the  server  must  combine  them  to  form  the  compound  estimates.  If  the  potential 
dependency  relation  contains  long  chains,  this  could  require  significant  time  and 
space. 

In  order  to  reduce  the  time,  space,  and  message  load  at  a  recovering  server,  the 
task  of  computing  estimates  could  be  distributed  across  the  functioning  servers 
in  the  system.  Each  functioning  server  could  locally  compute  the  basic  estimates 
related  to  the  objects  it  manages.  This  would  introduce  only  a  limited  amount  of 
message  traffic  at  each  server.  Once  the  basic  estimates  are  computed,  the  func¬ 
tioning  servers  could  exchange  their  results  and  combine  them  in  a  hierarchical 
fashion  in  order  to  form  the  overall  compound  estimates. 

9.1.2  Other  Consistency  Forms 

We  have  described  variants  of  our  recovery  mechanism  for  implementing  both 
causal  consistency  and  grouping  consistency.  An  interesting  problem  is  whether 
these  variants  can  be  combined  to  implement  serializable  consistency.  Grouping 
consistency  provides  the  all-or-none  property  required  by  serializability.  Causal 
consistency  might  then  be  added  to  implement  some  type  of  ordering  between 
the  requests  in  different  groups. 

A  related  problem  concerns  the  types  of  consistency  that  can  be  enforced  us¬ 
ing  our  basic  mechanism.  We  would  like  to  characterize  the  forms  of  consistency 
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Figure  9.1:  Logs  generating  non-optimal  estimates 


impiementable  using  dependency  estimates.  The  compound  estimates  of  chap¬ 
ter  6  apply  equally  well  to  both  causal  and  grouping  consistency.  The  question 
then  naturally  arises  as  to  whether  these  estimates  apply  to  more  general  forms 
or  classes  of  consistency. 

9.1.3  Optimal  Estimates 

The  compound  estimates  of  chapter  6  are  not  optimal  in  the  sense  that  they  may 
occasionally  yield  an  ordering  between  two  requests,  even  when  there  is  evidence 
available  in  the  system  to  contradict  the  ordering.  For  example,  consider  the  set 
of  logs  shown  in  figure  9.1.  This  figure  depicts  the  logs  of  six  servers  (f\ ,  /2,  fz , 
fit  h,  Mid  /«)>  each  server  managing  only  those  objects  for  which  requests  are 
shown  in  its  log.  Suppose  that  the  potential  dependency  relation  in  this  system 
forms  one  long  chain. 

E  D  C  B  A 
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Applying  the  compound  estimates  to  these  logs,  the  estimates  would  yield  an 
ordering  between  requests  a. A  and  e.E. 

e.A  ~<  a. A 

However,  from  the  logs  we  can  determine  that  this  ordering  is  not  possible.  Any 
dependency  of  request  a. A  on  request  e.E  must  occur  along  the  chain  of  objects 
depicted  above  (in  the  potential  dependency  relation).  From  the  log  of  server  f\ . 
we  know  that  any  such  dependency  would  include  either  request  b\.B  or  b^.B.  If 
the  dependency  included  request  b\.B ,  then  from  the  log  of  server  fi  we  know 
that  it  must  also  include  request  c\.C.  This  implies  that  a. A  is  dependent  on 
ci-C.  But,  this  ordering  is  contradicted  by  the  log  of  server  f&.  Similarly,  if  the 
dependency  chain  includes  request  bi.B,  then  from  the  log  of  server  / 4  we  know 
that  is  also  includes  d^.D.  This  implies  that  request  a. A  is  dependent  on  request 
di.D.  But,  this  ordering  is  also  contradicted  by  the  log  of  server  f&. 

An  interesting  problem  would  be  to  determine  an  optimal  set  of  dependency 
estimates  that  yield  an  efficient  implementation.  As  we  pointed  out  earlier,  the 
compound  estimates  apply  equally  well  to  both  causal  and  grouping  consistency. 
We  would  like  to  find  an  optimal  set  of  estimates  that  also  have  this  property, 
preferably  extending  to  other  consistency  forms  as  well.  Because  it  has  not  been 
the  goal  of  this  dissertation  to  pursue  complexity  issues,  we  will  not  make  any 
general  speculations  about  the  difficulty  of  computing  am  optimal  set  of  estimates. 
We  would  like  to  point  out,  however,  that  the  problem  of  determining  an  optimal 
set  of  estimates  is  reminiscent  of  other  optimality  results  in  the  literature  that 
have  been  shown  to  be  NP-complete  [Pap79]. 
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